House Prices: Advanced Regression Techniques

Competition Description

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Goal

To predict the sales price for each house. For each ID in the test set, you must predict the value of the SalePrice variable

Metric

Submission are evaluated on the Root-Mean-Squared-Error(RMSE) between the logarithm of the predicted value and logarithm of the observed sales price. (Taking logs means that errors in predicting the expensive houses and cheap houses will effect the result equally)

In [152]:
import numpy as np
from scipy import stats
from scipy.stats import norm, skew
import datetime as dt
import math
from math import radians, cos, sin, asin,sqrt
import glob
import os
import pandas as pd
import pandas_profiling
pd.set_option('display.max_columns', None)
# Visualization
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import matplotlib.gridspec as gridspec
import seaborn as sns
#import plotly.plotly as py
import plotly.graph_objs as go
#import plotly
#plotly.tools.set_credentials_file(username='peanuttbuddha', api_key='NJTdnmJo7EwDcaxEL9mO')
import plotly.offline as offline
offline.init_notebook_mode()
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import chartify
# NOTE THAT INLINE NEEDS TO BE LAST
%matplotlib inline
# Missing Data Visualization
import missingno as msno
In [153]:
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
In [154]:
train.shape, test.shape
Out[154]:
((1460, 81), (1459, 80))
In [155]:
train.columns
Out[155]:
Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
       'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC',
       'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType',
       'SaleCondition', 'SalePrice'],
      dtype='object')
In [156]:
train.head()
Out[156]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal 250000

Because of the plots below I will get rid of any values where sqft > 4000 and sale price < 500k (just those 2 datapoints)

In [157]:
#train = train.drop(train[((train['GrLivArea']>4000) & (train['SalePrice']<300000)) | (train['SalePrice']>700000)].index)
train = train.drop(train[((train['GrLivArea']>4000) & (train['SalePrice']<300000))].index)

train.reset_index(drop=True)
Out[157]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal 250000
5 6 50 RL 85.0 14115 Pave NaN IR1 Lvl AllPub Inside Gtl Mitchel Norm Norm 1Fam 1.5Fin 5 5 1993 1995 Gable CompShg VinylSd VinylSd None 0.0 TA TA Wood Gd TA No GLQ 732 Unf 0 64 796 GasA Ex Y SBrkr 796 566 0 1362 1 0 1 1 1 1 TA 5 Typ 0 NaN Attchd 1993.0 Unf 2 480 TA TA Y 40 30 0 320 0 0 NaN MnPrv Shed 700 10 2009 WD Normal 143000
6 7 20 RL 75.0 10084 Pave NaN Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 8 5 2004 2005 Gable CompShg VinylSd VinylSd Stone 186.0 Gd TA PConc Ex TA Av GLQ 1369 Unf 0 317 1686 GasA Ex Y SBrkr 1694 0 0 1694 1 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2004.0 RFn 2 636 TA TA Y 255 57 0 0 0 0 NaN NaN NaN 0 8 2007 WD Normal 307000
7 8 60 RL NaN 10382 Pave NaN IR1 Lvl AllPub Corner Gtl NWAmes PosN Norm 1Fam 2Story 7 6 1973 1973 Gable CompShg HdBoard HdBoard Stone 240.0 TA TA CBlock Gd TA Mn ALQ 859 BLQ 32 216 1107 GasA Ex Y SBrkr 1107 983 0 2090 1 0 2 1 3 1 TA 7 Typ 2 TA Attchd 1973.0 RFn 2 484 TA TA Y 235 204 228 0 0 0 NaN NaN Shed 350 11 2009 WD Normal 200000
8 9 50 RM 51.0 6120 Pave NaN Reg Lvl AllPub Inside Gtl OldTown Artery Norm 1Fam 1.5Fin 7 5 1931 1950 Gable CompShg BrkFace Wd Shng None 0.0 TA TA BrkTil TA TA No Unf 0 Unf 0 952 952 GasA Gd Y FuseF 1022 752 0 1774 0 0 2 0 2 2 TA 8 Min1 2 TA Detchd 1931.0 Unf 2 468 Fa TA Y 90 0 205 0 0 0 NaN NaN NaN 0 4 2008 WD Abnorml 129900
9 10 190 RL 50.0 7420 Pave NaN Reg Lvl AllPub Corner Gtl BrkSide Artery Artery 2fmCon 1.5Unf 5 6 1939 1950 Gable CompShg MetalSd MetalSd None 0.0 TA TA BrkTil TA TA No GLQ 851 Unf 0 140 991 GasA Ex Y SBrkr 1077 0 0 1077 1 0 1 0 2 2 TA 5 Typ 2 TA Attchd 1939.0 RFn 1 205 Gd TA Y 0 4 0 0 0 0 NaN NaN NaN 0 1 2008 WD Normal 118000
10 11 20 RL 70.0 11200 Pave NaN Reg Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 5 1965 1965 Hip CompShg HdBoard HdBoard None 0.0 TA TA CBlock TA TA No Rec 906 Unf 0 134 1040 GasA Ex Y SBrkr 1040 0 0 1040 1 0 1 0 3 1 TA 5 Typ 0 NaN Detchd 1965.0 Unf 1 384 TA TA Y 0 0 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 129500
11 12 60 RL 85.0 11924 Pave NaN IR1 Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 2Story 9 5 2005 2006 Hip CompShg WdShing Wd Shng Stone 286.0 Ex TA PConc Ex TA No GLQ 998 Unf 0 177 1175 GasA Ex Y SBrkr 1182 1142 0 2324 1 0 3 0 4 1 Ex 11 Typ 2 Gd BuiltIn 2005.0 Fin 3 736 TA TA Y 147 21 0 0 0 0 NaN NaN NaN 0 7 2006 New Partial 345000
12 13 20 RL NaN 12968 Pave NaN IR2 Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 6 1962 1962 Hip CompShg HdBoard Plywood None 0.0 TA TA CBlock TA TA No ALQ 737 Unf 0 175 912 GasA TA Y SBrkr 912 0 0 912 1 0 1 0 2 1 TA 4 Typ 0 NaN Detchd 1962.0 Unf 1 352 TA TA Y 140 0 0 0 176 0 NaN NaN NaN 0 9 2008 WD Normal 144000
13 14 20 RL 91.0 10652 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 2006 2007 Gable CompShg VinylSd VinylSd Stone 306.0 Gd TA PConc Gd TA Av Unf 0 Unf 0 1494 1494 GasA Ex Y SBrkr 1494 0 0 1494 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2006.0 RFn 3 840 TA TA Y 160 33 0 0 0 0 NaN NaN NaN 0 8 2007 New Partial 279500
14 15 20 RL NaN 10920 Pave NaN IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 5 1960 1960 Hip CompShg MetalSd MetalSd BrkFace 212.0 TA TA CBlock TA TA No BLQ 733 Unf 0 520 1253 GasA TA Y SBrkr 1253 0 0 1253 1 0 1 1 2 1 TA 5 Typ 1 Fa Attchd 1960.0 RFn 1 352 TA TA Y 0 213 176 0 0 0 NaN GdWo NaN 0 5 2008 WD Normal 157000
15 16 45 RM 51.0 6120 Pave NaN Reg Lvl AllPub Corner Gtl BrkSide Norm Norm 1Fam 1.5Unf 7 8 1929 2001 Gable CompShg Wd Sdng Wd Sdng None 0.0 TA TA BrkTil TA TA No Unf 0 Unf 0 832 832 GasA Ex Y FuseA 854 0 0 854 0 0 1 0 2 1 TA 5 Typ 0 NaN Detchd 1991.0 Unf 2 576 TA TA Y 48 112 0 0 0 0 NaN GdPrv NaN 0 7 2007 WD Normal 132000
16 17 20 RL NaN 11241 Pave NaN IR1 Lvl AllPub CulDSac Gtl NAmes Norm Norm 1Fam 1Story 6 7 1970 1970 Gable CompShg Wd Sdng Wd Sdng BrkFace 180.0 TA TA CBlock TA TA No ALQ 578 Unf 0 426 1004 GasA Ex Y SBrkr 1004 0 0 1004 1 0 1 0 2 1 TA 5 Typ 1 TA Attchd 1970.0 Fin 2 480 TA TA Y 0 0 0 0 0 0 NaN NaN Shed 700 3 2010 WD Normal 149000
17 18 90 RL 72.0 10791 Pave NaN Reg Lvl AllPub Inside Gtl Sawyer Norm Norm Duplex 1Story 4 5 1967 1967 Gable CompShg MetalSd MetalSd None 0.0 TA TA Slab NaN NaN NaN NaN 0 NaN 0 0 0 GasA TA Y SBrkr 1296 0 0 1296 0 0 2 0 2 2 TA 6 Typ 0 NaN CarPort 1967.0 Unf 2 516 TA TA Y 0 0 0 0 0 0 NaN NaN Shed 500 10 2006 WD Normal 90000
18 19 20 RL 66.0 13695 Pave NaN Reg Lvl AllPub Inside Gtl SawyerW RRAe Norm 1Fam 1Story 5 5 2004 2004 Gable CompShg VinylSd VinylSd None 0.0 TA TA PConc TA TA No GLQ 646 Unf 0 468 1114 GasA Ex Y SBrkr 1114 0 0 1114 1 0 1 1 3 1 Gd 6 Typ 0 NaN Detchd 2004.0 Unf 2 576 TA TA Y 0 102 0 0 0 0 NaN NaN NaN 0 6 2008 WD Normal 159000
19 20 20 RL 70.0 7560 Pave NaN Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 6 1958 1965 Hip CompShg BrkFace Plywood None 0.0 TA TA CBlock TA TA No LwQ 504 Unf 0 525 1029 GasA TA Y SBrkr 1339 0 0 1339 0 0 1 0 3 1 TA 6 Min1 0 NaN Attchd 1958.0 Unf 1 294 TA TA Y 0 0 0 0 0 0 NaN MnPrv NaN 0 5 2009 COD Abnorml 139000
20 21 60 RL 101.0 14215 Pave NaN IR1 Lvl AllPub Corner Gtl NridgHt Norm Norm 1Fam 2Story 8 5 2005 2006 Gable CompShg VinylSd VinylSd BrkFace 380.0 Gd TA PConc Ex TA Av Unf 0 Unf 0 1158 1158 GasA Ex Y SBrkr 1158 1218 0 2376 0 0 3 1 4 1 Gd 9 Typ 1 Gd BuiltIn 2005.0 RFn 3 853 TA TA Y 240 154 0 0 0 0 NaN NaN NaN 0 11 2006 New Partial 325300
21 22 45 RM 57.0 7449 Pave Grvl Reg Bnk AllPub Inside Gtl IDOTRR Norm Norm 1Fam 1.5Unf 7 7 1930 1950 Gable CompShg Wd Sdng Wd Sdng None 0.0 TA TA PConc TA TA No Unf 0 Unf 0 637 637 GasA Ex Y FuseF 1108 0 0 1108 0 0 1 0 3 1 Gd 6 Typ 1 Gd Attchd 1930.0 Unf 1 280 TA TA N 0 0 205 0 0 0 NaN GdPrv NaN 0 6 2007 WD Normal 139400
22 23 20 RL 75.0 9742 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 8 5 2002 2002 Hip CompShg VinylSd VinylSd BrkFace 281.0 Gd TA PConc Gd TA No Unf 0 Unf 0 1777 1777 GasA Ex Y SBrkr 1795 0 0 1795 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2002.0 RFn 2 534 TA TA Y 171 159 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 230000
23 24 120 RM 44.0 4224 Pave NaN Reg Lvl AllPub Inside Gtl MeadowV Norm Norm TwnhsE 1Story 5 7 1976 1976 Gable CompShg CemntBd CmentBd None 0.0 TA TA PConc Gd TA No GLQ 840 Unf 0 200 1040 GasA TA Y SBrkr 1060 0 0 1060 1 0 1 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 Unf 2 572 TA TA Y 100 110 0 0 0 0 NaN NaN NaN 0 6 2007 WD Normal 129900
24 25 20 RL NaN 8246 Pave NaN IR1 Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 8 1968 2001 Gable CompShg Plywood Plywood None 0.0 TA Gd CBlock TA TA Mn Rec 188 ALQ 668 204 1060 GasA Ex Y SBrkr 1060 0 0 1060 1 0 1 0 3 1 Gd 6 Typ 1 TA Attchd 1968.0 Unf 1 270 TA TA Y 406 90 0 0 0 0 NaN MnPrv NaN 0 5 2010 WD Normal 154000
25 26 20 RL 110.0 14230 Pave NaN Reg Lvl AllPub Corner Gtl NridgHt Norm Norm 1Fam 1Story 8 5 2007 2007 Gable CompShg VinylSd VinylSd Stone 640.0 Gd TA PConc Gd TA No Unf 0 Unf 0 1566 1566 GasA Ex Y SBrkr 1600 0 0 1600 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2007.0 RFn 3 890 TA TA Y 0 56 0 0 0 0 NaN NaN NaN 0 7 2009 WD Normal 256300
26 27 20 RL 60.0 7200 Pave NaN Reg Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 5 7 1951 2000 Gable CompShg Wd Sdng Wd Sdng None 0.0 TA TA CBlock TA TA Mn BLQ 234 Rec 486 180 900 GasA TA Y SBrkr 900 0 0 900 0 1 1 0 3 1 Gd 5 Typ 0 NaN Detchd 2005.0 Unf 2 576 TA TA Y 222 32 0 0 0 0 NaN NaN NaN 0 5 2010 WD Normal 134800
27 28 20 RL 98.0 11478 Pave NaN Reg Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 1Story 8 5 2007 2008 Gable CompShg VinylSd VinylSd Stone 200.0 Gd TA PConc Ex TA No GLQ 1218 Unf 0 486 1704 GasA Ex Y SBrkr 1704 0 0 1704 1 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2008.0 RFn 3 772 TA TA Y 0 50 0 0 0 0 NaN NaN NaN 0 5 2010 WD Normal 306000
28 29 20 RL 47.0 16321 Pave NaN IR1 Lvl AllPub CulDSac Gtl NAmes Norm Norm 1Fam 1Story 5 6 1957 1997 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock TA TA Gd BLQ 1277 Unf 0 207 1484 GasA TA Y SBrkr 1600 0 0 1600 1 0 1 0 2 1 TA 6 Typ 2 Gd Attchd 1957.0 RFn 1 319 TA TA Y 288 258 0 0 0 0 NaN NaN NaN 0 12 2006 WD Normal 207500
29 30 30 RM 60.0 6324 Pave NaN IR1 Lvl AllPub Inside Gtl BrkSide Feedr RRNn 1Fam 1Story 4 6 1927 1950 Gable CompShg MetalSd MetalSd None 0.0 TA TA BrkTil TA TA No Unf 0 Unf 0 520 520 GasA Fa N SBrkr 520 0 0 520 0 0 1 0 1 1 Fa 4 Typ 0 NaN Detchd 1920.0 Unf 1 240 Fa TA Y 49 0 87 0 0 0 NaN NaN NaN 0 5 2008 WD Normal 68500
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1428 1431 60 RL 60.0 21930 Pave NaN IR3 Lvl AllPub Inside Gtl Gilbert RRAn Norm 1Fam 2Story 5 5 2005 2005 Gable CompShg VinylSd VinylSd None 0.0 Gd TA PConc Gd Gd Av Unf 0 Unf 0 732 732 GasA Ex Y SBrkr 734 1104 0 1838 0 0 2 1 4 1 TA 7 Typ 1 Gd BuiltIn 2005.0 Fin 2 372 TA TA Y 100 40 0 0 0 0 NaN NaN NaN 0 7 2006 WD Normal 192140
1429 1432 120 RL NaN 4928 Pave NaN IR1 Lvl AllPub Inside Gtl NPkVill Norm Norm TwnhsE 1Story 6 6 1976 1976 Gable CompShg Plywood Plywood None 0.0 TA TA CBlock Gd TA No LwQ 958 Unf 0 0 958 GasA TA Y SBrkr 958 0 0 958 0 0 2 0 2 1 TA 5 Typ 0 NaN Attchd 1976.0 RFn 2 440 TA TA Y 0 60 0 0 0 0 NaN NaN NaN 0 10 2009 WD Normal 143750
1430 1433 30 RL 60.0 10800 Pave Grvl Reg Lvl AllPub Inside Gtl OldTown Norm Norm 1Fam 1Story 4 6 1927 2007 Gable CompShg Wd Sdng Wd Sdng None 0.0 TA TA BrkTil TA TA No Unf 0 Unf 0 656 656 GasA TA Y SBrkr 968 0 0 968 0 0 2 0 4 1 TA 5 Typ 0 NaN Detchd 1928.0 Unf 1 216 Fa Fa Y 0 0 0 0 0 0 NaN NaN NaN 0 8 2007 WD Normal 64500
1431 1434 60 RL 93.0 10261 Pave NaN IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 6 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 318.0 TA TA PConc Gd TA No Unf 0 Unf 0 936 936 GasA Ex Y SBrkr 962 830 0 1792 1 0 2 1 3 1 TA 8 Typ 1 TA Attchd 2000.0 Fin 2 451 TA TA Y 0 0 0 0 0 0 NaN NaN NaN 0 5 2008 WD Normal 186500
1432 1435 20 RL 80.0 17400 Pave NaN Reg Low AllPub Inside Mod Mitchel Norm Norm 1Fam 1Story 5 5 1977 1977 Gable CompShg BrkFace BrkFace None 0.0 TA TA CBlock TA TA No ALQ 936 Unf 0 190 1126 GasA Fa Y SBrkr 1126 0 0 1126 1 0 2 0 3 1 TA 5 Typ 1 Gd Attchd 1977.0 RFn 2 484 TA TA P 295 41 0 0 0 0 NaN NaN NaN 0 5 2006 WD Normal 160000
1433 1436 20 RL 80.0 8400 Pave NaN Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 6 9 1962 2005 Gable CompShg Wd Sdng Wd Sdng BrkFace 237.0 Gd Gd CBlock TA TA No Unf 0 Unf 0 1319 1319 GasA TA Y SBrkr 1537 0 0 1537 1 0 1 1 3 1 Gd 7 Typ 1 Gd Attchd 1962.0 RFn 2 462 TA TA Y 0 36 0 0 0 0 NaN GdPrv NaN 0 7 2008 COD Abnorml 174000
1434 1437 20 RL 60.0 9000 Pave NaN Reg Lvl AllPub FR2 Gtl NAmes Norm Norm 1Fam 1Story 4 6 1971 1971 Gable CompShg HdBoard HdBoard None 0.0 TA TA PConc TA TA No ALQ 616 Unf 0 248 864 GasA TA Y SBrkr 864 0 0 864 0 0 1 0 3 1 TA 5 Typ 0 NaN Detchd 1974.0 Unf 2 528 TA TA Y 0 0 0 0 0 0 NaN GdWo NaN 0 5 2007 WD Normal 120500
1435 1438 20 RL 96.0 12444 Pave NaN Reg Lvl AllPub FR2 Gtl NridgHt Norm Norm 1Fam 1Story 8 5 2008 2008 Hip CompShg VinylSd VinylSd Stone 426.0 Ex TA PConc Ex TA Av GLQ 1336 Unf 0 596 1932 GasA Ex Y SBrkr 1932 0 0 1932 1 0 2 0 2 1 Ex 7 Typ 1 Gd Attchd 2008.0 Fin 3 774 TA TA Y 0 66 0 304 0 0 NaN NaN NaN 0 11 2008 New Partial 394617
1436 1439 20 RM 90.0 7407 Pave NaN Reg Lvl AllPub Inside Gtl OldTown Artery Norm 1Fam 1Story 6 7 1957 1996 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock TA TA No GLQ 600 Unf 0 312 912 GasA TA Y FuseA 1236 0 0 1236 1 0 1 0 2 1 TA 6 Typ 0 NaN Attchd 1957.0 Unf 2 923 TA TA Y 0 158 158 0 0 0 NaN MnPrv NaN 0 4 2010 WD Normal 149700
1437 1440 60 RL 80.0 11584 Pave NaN Reg Lvl AllPub Inside Gtl NWAmes Norm Norm 1Fam SLvl 7 6 1979 1979 Hip CompShg HdBoard HdBoard BrkFace 96.0 TA TA CBlock TA TA No GLQ 315 Rec 110 114 539 GasA TA Y SBrkr 1040 685 0 1725 0 0 2 1 3 1 TA 6 Typ 1 TA Attchd 1979.0 RFn 2 550 TA TA Y 0 88 216 0 0 0 NaN NaN NaN 0 11 2007 WD Normal 197000
1438 1441 70 RL 79.0 11526 Pave NaN IR1 Bnk AllPub Inside Mod Crawfor Norm Norm 1Fam 2.5Fin 6 7 1922 1994 Gable CompShg MetalSd MetalSd None 0.0 TA TA BrkTil Ex TA No Unf 0 Unf 0 588 588 GasA Fa Y SBrkr 1423 748 384 2555 0 0 2 0 3 1 TA 11 Min1 1 Gd Detchd 1993.0 Fin 2 672 TA TA Y 431 0 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 191000
1439 1442 120 RM NaN 4426 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm TwnhsE 1Story 6 5 2004 2004 Gable CompShg VinylSd VinylSd BrkFace 147.0 Gd TA PConc Gd TA Av GLQ 697 Unf 0 151 848 GasA Ex Y SBrkr 848 0 0 848 1 0 1 0 1 1 Gd 3 Typ 1 TA Attchd 2004.0 RFn 2 420 TA TA Y 149 0 0 0 0 0 NaN NaN NaN 0 5 2008 WD Normal 149300
1440 1443 60 FV 85.0 11003 Pave NaN Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 2Story 10 5 2008 2008 Gable CompShg VinylSd VinylSd Stone 160.0 Ex TA PConc Ex TA Av GLQ 765 Unf 0 252 1017 GasA Ex Y SBrkr 1026 981 0 2007 1 0 2 1 3 1 Ex 10 Typ 1 Ex Attchd 2008.0 Fin 3 812 TA TA Y 168 52 0 0 0 0 NaN NaN NaN 0 4 2009 WD Normal 310000
1441 1444 30 RL NaN 8854 Pave NaN Reg Lvl AllPub Inside Gtl BrkSide Norm Norm 1Fam 1.5Unf 6 6 1916 1950 Gable CompShg Wd Sdng Wd Sdng None 0.0 TA TA BrkTil TA TA No Unf 0 Unf 0 952 952 Grav Fa N FuseF 952 0 0 952 0 0 1 0 2 1 Fa 4 Typ 1 Gd Detchd 1916.0 Unf 1 192 Fa Po P 0 98 0 0 40 0 NaN NaN NaN 0 5 2009 WD Normal 121000
1442 1445 20 RL 63.0 8500 Pave NaN Reg Lvl AllPub FR2 Gtl CollgCr Norm Norm 1Fam 1Story 7 5 2004 2004 Gable CompShg VinylSd VinylSd BrkFace 106.0 Gd TA PConc Gd TA Av Unf 0 Unf 0 1422 1422 GasA Ex Y SBrkr 1422 0 0 1422 0 0 2 0 3 1 Gd 7 Typ 0 NaN Attchd 2004.0 RFn 2 626 TA TA Y 192 60 0 0 0 0 NaN NaN NaN 0 11 2007 WD Normal 179600
1443 1446 85 RL 70.0 8400 Pave NaN Reg Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam SFoyer 6 5 1966 1966 Gable CompShg VinylSd VinylSd None 0.0 TA TA CBlock TA TA Gd LwQ 187 Rec 627 0 814 GasA Gd Y SBrkr 913 0 0 913 1 0 1 0 3 1 TA 6 Typ 0 NaN Detchd 1990.0 Unf 1 240 TA TA Y 0 0 252 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 129000
1444 1447 20 RL NaN 26142 Pave NaN IR1 Lvl AllPub CulDSac Gtl Mitchel Norm Norm 1Fam 1Story 5 7 1962 1962 Gable CompShg HdBoard HdBoard BrkFace 189.0 TA TA CBlock TA TA No Rec 593 Unf 0 595 1188 GasA TA Y SBrkr 1188 0 0 1188 0 0 1 0 3 1 TA 6 Typ 0 NaN Attchd 1962.0 Unf 1 312 TA TA P 261 39 0 0 0 0 NaN NaN NaN 0 4 2010 WD Normal 157900
1445 1448 60 RL 80.0 10000 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 8 5 1995 1996 Gable CompShg VinylSd VinylSd BrkFace 438.0 Gd TA PConc Gd TA No GLQ 1079 Unf 0 141 1220 GasA Ex Y SBrkr 1220 870 0 2090 1 0 2 1 3 1 Gd 8 Typ 1 TA Attchd 1995.0 RFn 2 556 TA TA Y 0 65 0 0 0 0 NaN NaN NaN 0 12 2007 WD Normal 240000
1446 1449 50 RL 70.0 11767 Pave NaN Reg Lvl AllPub Inside Gtl Edwards Norm Norm 1Fam 2Story 4 7 1910 2000 Gable CompShg MetalSd HdBoard None 0.0 TA TA CBlock Fa TA No Unf 0 Unf 0 560 560 GasA Gd N SBrkr 796 550 0 1346 0 0 1 1 2 1 TA 6 Min2 0 NaN Detchd 1950.0 Unf 1 384 Fa TA Y 168 24 0 0 0 0 NaN GdWo NaN 0 5 2007 WD Normal 112000
1447 1450 180 RM 21.0 1533 Pave NaN Reg Lvl AllPub Inside Gtl MeadowV Norm Norm Twnhs SFoyer 5 7 1970 1970 Gable CompShg CemntBd CmentBd None 0.0 TA TA CBlock Gd TA Av GLQ 553 Unf 0 77 630 GasA Ex Y SBrkr 630 0 0 630 1 0 1 0 1 1 Ex 3 Typ 0 NaN NaN NaN NaN 0 0 NaN NaN Y 0 0 0 0 0 0 NaN NaN NaN 0 8 2006 WD Abnorml 92000
1448 1451 90 RL 60.0 9000 Pave NaN Reg Lvl AllPub FR2 Gtl NAmes Norm Norm Duplex 2Story 5 5 1974 1974 Gable CompShg VinylSd VinylSd None 0.0 TA TA CBlock Gd TA No Unf 0 Unf 0 896 896 GasA TA Y SBrkr 896 896 0 1792 0 0 2 2 4 2 TA 8 Typ 0 NaN NaN NaN NaN 0 0 NaN NaN Y 32 45 0 0 0 0 NaN NaN NaN 0 9 2009 WD Normal 136000
1449 1452 20 RL 78.0 9262 Pave NaN Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 8 5 2008 2009 Gable CompShg CemntBd CmentBd Stone 194.0 Gd TA PConc Gd TA No Unf 0 Unf 0 1573 1573 GasA Ex Y SBrkr 1578 0 0 1578 0 0 2 0 3 1 Ex 7 Typ 1 Gd Attchd 2008.0 Fin 3 840 TA TA Y 0 36 0 0 0 0 NaN NaN NaN 0 5 2009 New Partial 287090
1450 1453 180 RM 35.0 3675 Pave NaN Reg Lvl AllPub Inside Gtl Edwards Norm Norm TwnhsE SLvl 5 5 2005 2005 Gable CompShg VinylSd VinylSd BrkFace 80.0 TA TA PConc Gd TA Gd GLQ 547 Unf 0 0 547 GasA Gd Y SBrkr 1072 0 0 1072 1 0 1 0 2 1 TA 5 Typ 0 NaN Basment 2005.0 Fin 2 525 TA TA Y 0 28 0 0 0 0 NaN NaN NaN 0 5 2006 WD Normal 145000
1451 1454 20 RL 90.0 17217 Pave NaN Reg Lvl AllPub Inside Gtl Mitchel Norm Norm 1Fam 1Story 5 5 2006 2006 Gable CompShg VinylSd VinylSd None 0.0 TA TA PConc Gd TA No Unf 0 Unf 0 1140 1140 GasA Ex Y SBrkr 1140 0 0 1140 0 0 1 0 3 1 TA 6 Typ 0 NaN NaN NaN NaN 0 0 NaN NaN Y 36 56 0 0 0 0 NaN NaN NaN 0 7 2006 WD Abnorml 84500
1452 1455 20 FV 62.0 7500 Pave Pave Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 7 5 2004 2005 Gable CompShg VinylSd VinylSd None 0.0 Gd TA PConc Gd TA No GLQ 410 Unf 0 811 1221 GasA Ex Y SBrkr 1221 0 0 1221 1 0 2 0 2 1 Gd 6 Typ 0 NaN Attchd 2004.0 RFn 2 400 TA TA Y 0 113 0 0 0 0 NaN NaN NaN 0 10 2009 WD Normal 185000
1453 1456 60 RL 62.0 7917 Pave NaN Reg Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 6 5 1999 2000 Gable CompShg VinylSd VinylSd None 0.0 TA TA PConc Gd TA No Unf 0 Unf 0 953 953 GasA Ex Y SBrkr 953 694 0 1647 0 0 2 1 3 1 TA 7 Typ 1 TA Attchd 1999.0 RFn 2 460 TA TA Y 0 40 0 0 0 0 NaN NaN NaN 0 8 2007 WD Normal 175000
1454 1457 20 RL 85.0 13175 Pave NaN Reg Lvl AllPub Inside Gtl NWAmes Norm Norm 1Fam 1Story 6 6 1978 1988 Gable CompShg Plywood Plywood Stone 119.0 TA TA CBlock Gd TA No ALQ 790 Rec 163 589 1542 GasA TA Y SBrkr 2073 0 0 2073 1 0 2 0 3 1 TA 7 Min1 2 TA Attchd 1978.0 Unf 2 500 TA TA Y 349 0 0 0 0 0 NaN MnPrv NaN 0 2 2010 WD Normal 210000
1455 1458 70 RL 66.0 9042 Pave NaN Reg Lvl AllPub Inside Gtl Crawfor Norm Norm 1Fam 2Story 7 9 1941 2006 Gable CompShg CemntBd CmentBd None 0.0 Ex Gd Stone TA Gd No GLQ 275 Unf 0 877 1152 GasA Ex Y SBrkr 1188 1152 0 2340 0 0 2 0 4 1 Gd 9 Typ 2 Gd Attchd 1941.0 RFn 1 252 TA TA Y 0 60 0 0 0 0 NaN GdPrv Shed 2500 5 2010 WD Normal 266500
1456 1459 20 RL 68.0 9717 Pave NaN Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 6 1950 1996 Hip CompShg MetalSd MetalSd None 0.0 TA TA CBlock TA TA Mn GLQ 49 Rec 1029 0 1078 GasA Gd Y FuseA 1078 0 0 1078 1 0 1 0 2 1 Gd 5 Typ 0 NaN Attchd 1950.0 Unf 1 240 TA TA Y 366 0 112 0 0 0 NaN NaN NaN 0 4 2010 WD Normal 142125
1457 1460 20 RL 75.0 9937 Pave NaN Reg Lvl AllPub Inside Gtl Edwards Norm Norm 1Fam 1Story 5 6 1965 1965 Gable CompShg HdBoard HdBoard None 0.0 Gd TA CBlock TA TA No BLQ 830 LwQ 290 136 1256 GasA Gd Y SBrkr 1256 0 0 1256 1 0 1 1 3 1 TA 6 Typ 0 NaN Attchd 1965.0 Fin 1 276 TA TA Y 736 68 0 0 0 0 NaN NaN NaN 0 6 2008 WD Normal 147500

1458 rows × 81 columns

In [158]:
data = train.append(test, ignore_index=True)
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\frame.py:6211: FutureWarning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.


In [159]:
data.describe()
Out[159]:
1stFlrSF 2ndFlrSF 3SsnPorch BedroomAbvGr BsmtFinSF1 BsmtFinSF2 BsmtFullBath BsmtHalfBath BsmtUnfSF EnclosedPorch Fireplaces FullBath GarageArea GarageCars GarageYrBlt GrLivArea HalfBath Id KitchenAbvGr LotArea LotFrontage LowQualFinSF MSSubClass MasVnrArea MiscVal MoSold OpenPorchSF OverallCond OverallQual PoolArea SalePrice ScreenPorch TotRmsAbvGrd TotalBsmtSF WoodDeckSF YearBuilt YearRemodAdd YrSold
count 2917.000000 2917.000000 2917.000000 2917.000000 2916.000000 2916.000000 2915.000000 2915.000000 2916.000000 2917.000000 2917.000000 2917.000000 2916.000000 2916.000000 2758.000000 2917.000000 2917.000000 2917.000000 2917.000000 2917.000000 2431.000000 2917.000000 2917.000000 2894.000000 2917.000000 2917.000000 2917.000000 2917.000000 2917.000000 2917.000000 1458.000000 2917.000000 2917.000000 2916.000000 2917.000000 2917.000000 2917.000000 2917.000000
mean 1157.692492 335.861502 2.604045 2.860130 439.015432 49.616255 0.429160 0.061407 560.695816 23.114158 0.596160 1.567364 472.409465 1.766118 1978.092096 1498.251628 0.379842 1460.376071 1.044566 10139.439150 69.180584 4.697635 57.135756 101.733587 50.860816 6.213576 47.280082 5.564964 6.086390 2.088790 180932.919067 16.073363 6.448063 1049.327503 93.629414 1971.287967 1984.248200 2007.792938
std 385.264298 428.119663 25.196714 0.822967 444.182329 169.258662 0.524002 0.245766 439.651650 64.263424 0.644773 0.552465 214.620878 0.761531 25.571300 496.908626 0.502782 842.892456 0.214532 7807.036512 22.791719 46.412570 42.532140 178.510291 567.595198 2.713070 67.118965 1.113414 1.406704 34.561371 79495.055285 56.202054 1.564281 429.105905 126.532643 30.286991 20.892257 1.315328
min 334.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1895.000000 334.000000 0.000000 1.000000 0.000000 1300.000000 21.000000 0.000000 20.000000 0.000000 0.000000 1.000000 0.000000 1.000000 1.000000 0.000000 34900.000000 0.000000 2.000000 0.000000 0.000000 1872.000000 1950.000000 2006.000000
25% 876.000000 0.000000 0.000000 2.000000 0.000000 0.000000 0.000000 0.000000 220.000000 0.000000 0.000000 1.000000 320.000000 1.000000 1960.000000 1126.000000 0.000000 731.000000 1.000000 7476.000000 59.000000 0.000000 20.000000 0.000000 0.000000 4.000000 0.000000 5.000000 5.000000 0.000000 129925.000000 0.000000 5.000000 793.000000 0.000000 1953.000000 1965.000000 2007.000000
50% 1082.000000 0.000000 0.000000 3.000000 368.000000 0.000000 0.000000 0.000000 467.000000 0.000000 1.000000 2.000000 480.000000 2.000000 1979.000000 1444.000000 0.000000 1461.000000 1.000000 9452.000000 68.000000 0.000000 50.000000 0.000000 0.000000 6.000000 26.000000 5.000000 6.000000 0.000000 163000.000000 0.000000 6.000000 988.500000 0.000000 1973.000000 1993.000000 2008.000000
75% 1384.000000 704.000000 0.000000 3.000000 733.000000 0.000000 1.000000 0.000000 804.500000 0.000000 1.000000 2.000000 576.000000 2.000000 2002.000000 1743.000000 1.000000 2190.000000 1.000000 11556.000000 80.000000 0.000000 70.000000 164.000000 0.000000 8.000000 70.000000 6.000000 7.000000 0.000000 214000.000000 0.000000 7.000000 1302.000000 168.000000 2001.000000 2004.000000 2009.000000
max 5095.000000 2065.000000 508.000000 8.000000 4010.000000 1526.000000 3.000000 2.000000 2336.000000 1012.000000 4.000000 4.000000 1488.000000 5.000000 2207.000000 5095.000000 2.000000 2919.000000 3.000000 215245.000000 313.000000 1064.000000 190.000000 1600.000000 17000.000000 12.000000 742.000000 9.000000 10.000000 800.000000 755000.000000 576.000000 15.000000 5095.000000 1424.000000 2010.000000 2010.000000 2010.000000
In [160]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2917 entries, 0 to 2916
Data columns (total 81 columns):
1stFlrSF         2917 non-null int64
2ndFlrSF         2917 non-null int64
3SsnPorch        2917 non-null int64
Alley            198 non-null object
BedroomAbvGr     2917 non-null int64
BldgType         2917 non-null object
BsmtCond         2835 non-null object
BsmtExposure     2835 non-null object
BsmtFinSF1       2916 non-null float64
BsmtFinSF2       2916 non-null float64
BsmtFinType1     2838 non-null object
BsmtFinType2     2837 non-null object
BsmtFullBath     2915 non-null float64
BsmtHalfBath     2915 non-null float64
BsmtQual         2836 non-null object
BsmtUnfSF        2916 non-null float64
CentralAir       2917 non-null object
Condition1       2917 non-null object
Condition2       2917 non-null object
Electrical       2916 non-null object
EnclosedPorch    2917 non-null int64
ExterCond        2917 non-null object
ExterQual        2917 non-null object
Exterior1st      2916 non-null object
Exterior2nd      2916 non-null object
Fence            571 non-null object
FireplaceQu      1497 non-null object
Fireplaces       2917 non-null int64
Foundation       2917 non-null object
FullBath         2917 non-null int64
Functional       2915 non-null object
GarageArea       2916 non-null float64
GarageCars       2916 non-null float64
GarageCond       2758 non-null object
GarageFinish     2758 non-null object
GarageQual       2758 non-null object
GarageType       2760 non-null object
GarageYrBlt      2758 non-null float64
GrLivArea        2917 non-null int64
HalfBath         2917 non-null int64
Heating          2917 non-null object
HeatingQC        2917 non-null object
HouseStyle       2917 non-null object
Id               2917 non-null int64
KitchenAbvGr     2917 non-null int64
KitchenQual      2916 non-null object
LandContour      2917 non-null object
LandSlope        2917 non-null object
LotArea          2917 non-null int64
LotConfig        2917 non-null object
LotFrontage      2431 non-null float64
LotShape         2917 non-null object
LowQualFinSF     2917 non-null int64
MSSubClass       2917 non-null int64
MSZoning         2913 non-null object
MasVnrArea       2894 non-null float64
MasVnrType       2893 non-null object
MiscFeature      105 non-null object
MiscVal          2917 non-null int64
MoSold           2917 non-null int64
Neighborhood     2917 non-null object
OpenPorchSF      2917 non-null int64
OverallCond      2917 non-null int64
OverallQual      2917 non-null int64
PavedDrive       2917 non-null object
PoolArea         2917 non-null int64
PoolQC           9 non-null object
RoofMatl         2917 non-null object
RoofStyle        2917 non-null object
SaleCondition    2917 non-null object
SalePrice        1458 non-null float64
SaleType         2916 non-null object
ScreenPorch      2917 non-null int64
Street           2917 non-null object
TotRmsAbvGrd     2917 non-null int64
TotalBsmtSF      2916 non-null float64
Utilities        2915 non-null object
WoodDeckSF       2917 non-null int64
YearBuilt        2917 non-null int64
YearRemodAdd     2917 non-null int64
YrSold           2917 non-null int64
dtypes: float64(12), int64(26), object(43)
memory usage: 1.8+ MB
In [161]:
data.isnull().sum()
Out[161]:
1stFlrSF            0
2ndFlrSF            0
3SsnPorch           0
Alley            2719
BedroomAbvGr        0
BldgType            0
BsmtCond           82
BsmtExposure       82
BsmtFinSF1          1
BsmtFinSF2          1
BsmtFinType1       79
BsmtFinType2       80
BsmtFullBath        2
BsmtHalfBath        2
BsmtQual           81
BsmtUnfSF           1
CentralAir          0
Condition1          0
Condition2          0
Electrical          1
EnclosedPorch       0
ExterCond           0
ExterQual           0
Exterior1st         1
Exterior2nd         1
Fence            2346
FireplaceQu      1420
Fireplaces          0
Foundation          0
FullBath            0
                 ... 
LotShape            0
LowQualFinSF        0
MSSubClass          0
MSZoning            4
MasVnrArea         23
MasVnrType         24
MiscFeature      2812
MiscVal             0
MoSold              0
Neighborhood        0
OpenPorchSF         0
OverallCond         0
OverallQual         0
PavedDrive          0
PoolArea            0
PoolQC           2908
RoofMatl            0
RoofStyle           0
SaleCondition       0
SalePrice        1459
SaleType            1
ScreenPorch         0
Street              0
TotRmsAbvGrd        0
TotalBsmtSF         1
Utilities           2
WoodDeckSF          0
YearBuilt           0
YearRemodAdd        0
YrSold              0
Length: 81, dtype: int64

Lots of null values but can't see all the columns

In [162]:
# Creating a list to see all the columns that have any missing values, assigning to a variable in case I want to use this at some point
list_of_na_columns = data.columns[data.isna().any()].tolist()
list_of_na_columns
Out[162]:
['Alley',
 'BsmtCond',
 'BsmtExposure',
 'BsmtFinSF1',
 'BsmtFinSF2',
 'BsmtFinType1',
 'BsmtFinType2',
 'BsmtFullBath',
 'BsmtHalfBath',
 'BsmtQual',
 'BsmtUnfSF',
 'Electrical',
 'Exterior1st',
 'Exterior2nd',
 'Fence',
 'FireplaceQu',
 'Functional',
 'GarageArea',
 'GarageCars',
 'GarageCond',
 'GarageFinish',
 'GarageQual',
 'GarageType',
 'GarageYrBlt',
 'KitchenQual',
 'LotFrontage',
 'MSZoning',
 'MasVnrArea',
 'MasVnrType',
 'MiscFeature',
 'PoolQC',
 'SalePrice',
 'SaleType',
 'TotalBsmtSF',
 'Utilities']

That is a lot of information. Let's start with finding and filling missing values and then run the report again

  • Alley has 2721 / 93.2% missing values Missing
  • BsmtCond has 82 / 2.8% missing values Missing
  • BsmtExposure has 82 / 2.8% missing values Missing
  • BsmtFinType1 has 79 / 2.7% missing values Missing
  • BsmtFinType2 has 80 / 2.7% missing values Missing
  • BsmtUnfSF 1 missing values Missing
  • BsmtFinSF1 1 missing values Missing
  • BsmtFinSF2 1 missing values Missing
  • BsmtQual has 81 / 2.8% missing values Missing
  • Fence has 2348 / 80.4% missing values Missing
  • FireplaceQu has 1420 / 48.6% missing values Missing
  • GarageCond has 159 / 5.4% missing values Missing
  • GarageFinish has 159 / 5.4% missing values Missing
  • GarageQual has 159 / 5.4% missing values Missing
  • GarageType has 157 / 5.4% missing values Missing
  • GarageYrBlt has 159 / 5.4% missing values Missing
  • LotFrontage has 486 / 16.6% missing values Missing
  • MiscFeature has 2814 / 96.4% missing values Missing
  • PoolQC has 2909 / 99.7% missing values Missing
  • SalePrice has 1459 / 50.0% missing values Missing

Breaking Down null values

  1. null values where we know we can just impute 'None'
    • Alley we can impute NaN with None, since we can assume they don't have an alley
    • Fence we can also do the same
    • MiscFeature we can also do the same
    • FireplaceQu we can also do the same since we can see that the 'Fireplaces' column has 1420 zeros and FireplaceQu has 1420 NaN
  2. Null Values where we need to do more investigation
    • PoolQC. We have 13 non 0 values in the Pool Area and only 10 values for PoolQC, meaning there are 3 PoolQC values that are not 'None', so first we need to find the rows that have PoolArea filled in where PoolQC are null and fill those in somehow, then we can impute the rest with 'None'
    • Bsmt stuff
    • Garage stuff
    • LotFrontage Possibly bin LotArea and take the average based on that
    • Function
    • KitchenQual

Starting with filling in PoolQC values

In [165]:
data.loc[(data.PoolArea!=0) & (data.PoolQC.isnull())]
Out[165]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2418 1647 0 0 NaN 3 1Fam TA No 595.0 354.0 BLQ Rec 1.0 0.0 TA 156.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd GdPrv Fa 1 CBlock 1 Min1 280.0 1.0 TA Fin TA Attchd 1953.0 1647 0 GasA Gd 1Story 2421 1 TA Lvl Gtl 9532 Inside 75.0 Reg 0 20 RL 0.0 None NaN 0 2 NAmes 0 6 4 Y 368 NaN CompShg Gable Normal NaN WD 0 Pave 6 1105.0 AllPub 225 1953 1953 2007
2501 1105 717 0 NaN 4 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 1105.0 Y Feedr Norm SBrkr 1012 TA TA Wd Sdng Wd Sdng NaN Po 1 CBlock 2 Min2 515.0 2.0 TA Unf TA Attchd 1984.0 1822 0 GasA Ex 1.5Fin 2504 1 Gd Lvl Gtl 23920 Inside 104.0 Reg 0 50 RL 0.0 None NaN 0 4 SawyerW 195 5 6 P 444 NaN CompShg Gable Normal NaN WD 0 Pave 7 1105.0 AllPub 0 1984 1984 2007
2597 2034 0 0 NaN 2 1Fam NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN 0.0 Y Artery Norm SBrkr 0 TA TA MetalSd MetalSd GdPrv NaN 0 CBlock 1 Min1 1041.0 4.0 TA RFn TA 2Types 1953.0 2034 0 GasA Ex 1Story 2600 1 TA Lvl Gtl 43500 Inside 200.0 Reg 0 20 RL 0.0 None NaN 0 6 Mitchel 266 5 3 N 561 NaN CompShg Gable Normal NaN WD 0 Pave 9 0.0 AllPub 483 1953 1953 2007
In [166]:
# Make some graphs to see if I can impute these by that information
fig= plt.figure(figsize=(16,8))
ax1 = fig.add_subplot(121)
sns.boxplot(x='PoolQC', y='PoolArea', data=data, ax=ax1)
ax2 = fig.add_subplot(122)
sns.boxplot(x='PoolQC', y='SalePrice', data=data, ax=ax2)
Out[166]:
<matplotlib.axes._subplots.AxesSubplot at 0x279e24e06d8>

There aren't a lot of values in this category, so It doesn't matter too much but based on these graphs I'll fill the smaller pool area values with 'Ex' and the 561 pool area with 'Fa'

In [167]:
# Changing the specific null values to 'Ex' or 'Fa'
data['PoolQC'].loc[(data['PoolQC'].isnull()) & (data['PoolArea'] < 500) & (data['PoolArea'] !=0)] = 'Ex'
data['PoolQC'].loc[(data['PoolQC'].isnull()) & (data['PoolArea'] > 500)] = 'Fa'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Done with pool until we do the mass fill in of null values

Basement Investigation

In [168]:
bsmt_list = [col for col in data if 'Bsmt' in col]
bsmt_list
Out[168]:
['BsmtCond',
 'BsmtExposure',
 'BsmtFinSF1',
 'BsmtFinSF2',
 'BsmtFinType1',
 'BsmtFinType2',
 'BsmtFullBath',
 'BsmtHalfBath',
 'BsmtQual',
 'BsmtUnfSF',
 'TotalBsmtSF']

BsmtCond

In [169]:
data.loc[(data.TotalBsmtSF != 0) & (data.BsmtCond.isnull())]
Out[169]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2038 1671 0 0 NaN 3 1Fam NaN Mn 1044.0 382.0 GLQ Rec 1.0 0.0 Gd 0.0 Y Norm Norm SBrkr 0 Ex Ex VinylSd VinylSd GdWo Gd 1 CBlock 3 Typ 550.0 2.0 TA RFn TA Attchd 1976.0 1671 0 GasA Ex 1Story 2041 1 Ex Lvl Gtl 16280 Inside 103.0 Reg 0 20 RL 0.0 None NaN 0 5 Veenker 90 9 8 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 1426.0 AllPub 280 1976 2007 2008
2118 896 0 0 NaN 2 1Fam NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Y Feedr Norm FuseA 0 TA TA MetalSd CBlock MnPrv NaN 0 PConc 1 Typ 280.0 1.0 TA Unf TA Detchd 1946.0 896 0 GasA TA 1Story 2121 1 TA Lvl Gtl 5940 FR3 99.0 IR1 0 20 RM 0.0 None NaN 0 4 BrkSide 0 7 4 Y 0 NaN CompShg Gable Abnorml NaN ConLD 0 Pave 4 NaN AllPub 0 1946 1950 2008
2183 1127 0 0 NaN 3 1Fam NaN No 1033.0 0.0 BLQ Unf 0.0 1.0 TA 94.0 Y Norm Norm SBrkr 138 TA TA HdBoard Plywood NaN Po 1 CBlock 1 Typ 480.0 2.0 TA Unf TA Detchd 1991.0 1127 1 GasA TA 1Story 2186 1 TA Lvl Gtl 6500 Inside 65.0 Reg 0 20 RL 84.0 BrkFace NaN 0 5 Edwards 0 6 6 Y 0 NaN CompShg Hip Normal NaN WD 0 Pave 6 1127.0 AllPub 0 1976 1976 2008
2522 1009 0 0 NaN 3 1Fam NaN Av 755.0 0.0 ALQ Unf 0.0 0.0 TA 240.0 Y Norm Norm SBrkr 0 TA TA Plywood VinylSd MnPrv Fa 1 CBlock 2 Typ 576.0 2.0 TA Unf TA Detchd 1977.0 1009 0 GasA TA SLvl 2525 1 TA Lvl Gtl 9720 Inside 72.0 Reg 0 80 RL 51.0 BrkFace NaN 0 6 CollgCr 0 7 5 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 995.0 AllPub 0 1977 1977 2007
In [170]:
data.loc[(data.BsmtCond.isnull()) & ((data.index ==2040 ) | (data.index==2185) | (data.index ==2524))]
Out[170]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
In [171]:
# Make some graphs to see if I can impute these by that information
fig= plt.figure(figsize=(16,8))
ax1 = fig.add_subplot(221)
sns.boxplot(x='BsmtCond', y='BsmtFinSF1', data=data, ax=ax1)
ax2 = fig.add_subplot(222)
sns.boxplot(x='OverallCond', y='TotalBsmtSF', data=data, ax=ax2)
ax3 = fig.add_subplot(223)
sns.boxplot(x='BsmtFinType1', y='BsmtFinSF1', data=data, ax=ax3)
ax4 = fig.add_subplot(224)
sns.boxplot(x='BsmtExposure', y='TotalBsmtSF', data=data, ax=ax4)
Out[171]:
<matplotlib.axes._subplots.AxesSubplot at 0x279dbb47a20>
In [172]:
# Comparing overallcond with bsmtcond
plt.figure(figsize=(16,18))
g = sns.lmplot( x="TotalBsmtSF", y="OverallCond", data=data, fit_reg=False, hue='BsmtCond', scatter_kws={"s": 50},height=10)
g.set(xlim=(700, 2500))
Out[172]:
<seaborn.axisgrid.FacetGrid at 0x279df6f3e80>
<Figure size 1600x1800 with 0 Axes>
In [173]:
# Confirming the numbers
data.loc[(data.OverallCond ==5) & (data.BsmtCond == 'TA')]
Out[173]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
0 856 854 0 NaN 3 1Fam TA No 706.0 0.0 GLQ Unf 1.0 0.0 Gd 150.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Typ 548.0 2.0 TA RFn TA Attchd 2003.0 1710 1 GasA Ex 2Story 1 1 Gd Lvl Gtl 8450 Inside 65.0 Reg 0 60 RL 196.0 BrkFace NaN 0 2 CollgCr 61 5 7 Y 0 NaN CompShg Gable Normal 208500.0 WD 0 Pave 8 856.0 AllPub 0 2003 2003 2008
2 920 866 0 NaN 3 1Fam TA Mn 486.0 0.0 GLQ Unf 1.0 0.0 Gd 434.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN TA 1 PConc 2 Typ 608.0 2.0 TA RFn TA Attchd 2001.0 1786 1 GasA Ex 2Story 3 1 Gd Lvl Gtl 11250 Inside 68.0 IR1 0 60 RL 162.0 BrkFace NaN 0 9 CollgCr 42 5 7 Y 0 NaN CompShg Gable Normal 223500.0 WD 0 Pave 6 920.0 AllPub 0 2001 2002 2008
4 1145 1053 0 NaN 4 1Fam TA Av 655.0 0.0 GLQ Unf 1.0 0.0 Gd 490.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN TA 1 PConc 2 Typ 836.0 3.0 TA RFn TA Attchd 2000.0 2198 1 GasA Ex 2Story 5 1 Gd Lvl Gtl 14260 FR2 84.0 IR1 0 60 RL 350.0 BrkFace NaN 0 12 NoRidge 84 5 8 Y 0 NaN CompShg Gable Normal 250000.0 WD 0 Pave 9 1145.0 AllPub 192 2000 2000 2008
5 796 566 320 NaN 1 1Fam TA No 732.0 0.0 GLQ Unf 1.0 0.0 Gd 64.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd MnPrv NaN 0 Wood 1 Typ 480.0 2.0 TA Unf TA Attchd 1993.0 1362 1 GasA Ex 1.5Fin 6 1 TA Lvl Gtl 14115 Inside 85.0 IR1 0 50 RL 0.0 None Shed 700 10 Mitchel 30 5 5 Y 0 NaN CompShg Gable Normal 143000.0 WD 0 Pave 5 796.0 AllPub 40 1993 1995 2009
6 1694 0 0 NaN 3 1Fam TA Av 1369.0 0.0 GLQ Unf 1.0 0.0 Ex 317.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 636.0 2.0 TA RFn TA Attchd 2004.0 1694 0 GasA Ex 1Story 7 1 Gd Lvl Gtl 10084 Inside 75.0 Reg 0 20 RL 186.0 Stone NaN 0 8 Somerst 57 5 8 Y 0 NaN CompShg Gable Normal 307000.0 WD 0 Pave 7 1686.0 AllPub 255 2004 2005 2007
8 1022 752 0 NaN 2 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 952.0 Y Artery Norm FuseF 205 TA TA BrkFace Wd Shng NaN TA 2 BrkTil 2 Min1 468.0 2.0 TA Unf Fa Detchd 1931.0 1774 0 GasA Gd 1.5Fin 9 2 TA Lvl Gtl 6120 Inside 51.0 Reg 0 50 RM 0.0 None NaN 0 4 OldTown 0 5 7 Y 0 NaN CompShg Gable Abnorml 129900.0 WD 0 Pave 8 952.0 AllPub 90 1931 1950 2008
10 1040 0 0 NaN 3 1Fam TA No 906.0 0.0 Rec Unf 1.0 0.0 TA 134.0 Y Norm Norm SBrkr 0 TA TA HdBoard HdBoard NaN NaN 0 CBlock 1 Typ 384.0 1.0 TA Unf TA Detchd 1965.0 1040 0 GasA Ex 1Story 11 1 TA Lvl Gtl 11200 Inside 70.0 Reg 0 20 RL 0.0 None NaN 0 2 Sawyer 0 5 5 Y 0 NaN CompShg Hip Normal 129500.0 WD 0 Pave 5 1040.0 AllPub 0 1965 1965 2008
11 1182 1142 0 NaN 4 1Fam TA No 998.0 0.0 GLQ Unf 1.0 0.0 Ex 177.0 Y Norm Norm SBrkr 0 TA Ex WdShing Wd Shng NaN Gd 2 PConc 3 Typ 736.0 3.0 TA Fin TA BuiltIn 2005.0 2324 0 GasA Ex 2Story 12 1 Ex Lvl Gtl 11924 Inside 85.0 IR1 0 60 RL 286.0 Stone NaN 0 7 NridgHt 21 5 9 Y 0 NaN CompShg Hip Partial 345000.0 New 0 Pave 11 1175.0 AllPub 147 2005 2006 2006
13 1494 0 0 NaN 3 1Fam TA Av 0.0 0.0 Unf Unf 0.0 0.0 Gd 1494.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 840.0 3.0 TA RFn TA Attchd 2006.0 1494 0 GasA Ex 1Story 14 1 Gd Lvl Gtl 10652 Inside 91.0 IR1 0 20 RL 306.0 Stone NaN 0 8 CollgCr 33 5 7 Y 0 NaN CompShg Gable Partial 279500.0 New 0 Pave 7 1494.0 AllPub 160 2006 2007 2007
14 1253 0 0 NaN 2 1Fam TA No 733.0 0.0 BLQ Unf 1.0 0.0 TA 520.0 Y Norm Norm SBrkr 176 TA TA MetalSd MetalSd GdWo Fa 1 CBlock 1 Typ 352.0 1.0 TA RFn TA Attchd 1960.0 1253 1 GasA TA 1Story 15 1 TA Lvl Gtl 10920 Corner NaN IR1 0 20 RL 212.0 BrkFace NaN 0 5 NAmes 213 5 6 Y 0 NaN CompShg Hip Normal 157000.0 WD 0 Pave 5 1253.0 AllPub 0 1960 1960 2008
18 1114 0 0 NaN 3 1Fam TA No 646.0 0.0 GLQ Unf 1.0 0.0 TA 468.0 Y RRAe Norm SBrkr 0 TA TA VinylSd VinylSd NaN NaN 0 PConc 1 Typ 576.0 2.0 TA Unf TA Detchd 2004.0 1114 1 GasA Ex 1Story 19 1 Gd Lvl Gtl 13695 Inside 66.0 Reg 0 20 RL 0.0 None NaN 0 6 SawyerW 102 5 5 Y 0 NaN CompShg Gable Normal 159000.0 WD 0 Pave 6 1114.0 AllPub 0 2004 2004 2008
20 1158 1218 0 NaN 4 1Fam TA Av 0.0 0.0 Unf Unf 0.0 0.0 Ex 1158.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 3 Typ 853.0 3.0 TA RFn TA BuiltIn 2005.0 2376 1 GasA Ex 2Story 21 1 Gd Lvl Gtl 14215 Corner 101.0 IR1 0 60 RL 380.0 BrkFace NaN 0 11 NridgHt 154 5 8 Y 0 NaN CompShg Gable Partial 325300.0 New 0 Pave 9 1158.0 AllPub 240 2005 2006 2006
22 1795 0 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Gd 1777.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 534.0 2.0 TA RFn TA Attchd 2002.0 1795 0 GasA Ex 1Story 23 1 Gd Lvl Gtl 9742 Inside 75.0 Reg 0 20 RL 281.0 BrkFace NaN 0 9 CollgCr 159 5 8 Y 0 NaN CompShg Hip Normal 230000.0 WD 0 Pave 7 1777.0 AllPub 171 2002 2002 2008
25 1600 0 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Gd 1566.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 890.0 3.0 TA RFn TA Attchd 2007.0 1600 0 GasA Ex 1Story 26 1 Gd Lvl Gtl 14230 Corner 110.0 Reg 0 20 RL 640.0 Stone NaN 0 7 NridgHt 56 5 8 Y 0 NaN CompShg Gable Normal 256300.0 WD 0 Pave 7 1566.0 AllPub 0 2007 2007 2009
27 1704 0 0 NaN 3 1Fam TA No 1218.0 0.0 GLQ Unf 1.0 0.0 Ex 486.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 772.0 3.0 TA RFn TA Attchd 2008.0 1704 0 GasA Ex 1Story 28 1 Gd Lvl Gtl 11478 Inside 98.0 Reg 0 20 RL 200.0 Stone NaN 0 5 NridgHt 50 5 8 Y 0 NaN CompShg Gable Normal 306000.0 WD 0 Pave 7 1704.0 AllPub 0 2007 2008 2010
32 1234 0 0 NaN 3 1Fam TA Av 0.0 0.0 Unf Unf 0.0 0.0 Ex 1234.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Typ 484.0 2.0 TA RFn TA Attchd 2007.0 1234 0 GasA Ex 1Story 33 1 Gd Lvl Gtl 11049 Corner 85.0 Reg 0 20 RL 0.0 None NaN 0 1 CollgCr 30 5 8 Y 0 NaN CompShg Gable Normal 179900.0 WD 0 Pave 7 1234.0 AllPub 0 2007 2007 2008
33 1700 0 0 NaN 4 1Fam TA No 1018.0 0.0 Rec Unf 0.0 1.0 TA 380.0 Y Norm Norm SBrkr 0 TA TA BrkFace BrkFace NaN Gd 1 CBlock 1 Typ 447.0 2.0 TA RFn TA Attchd 1959.0 1700 1 GasA Gd 1Story 34 1 Gd Lvl Gtl 10552 Inside 70.0 IR1 0 20 RL 0.0 None NaN 0 4 NAmes 38 5 5 Y 0 NaN CompShg Hip Normal 165500.0 WD 0 Pave 6 1398.0 AllPub 0 1959 1959 2010
34 1561 0 0 NaN 2 TwnhsE TA No 1153.0 0.0 GLQ Unf 1.0 0.0 Ex 408.0 Y Norm Norm SBrkr 0 TA Ex MetalSd MetalSd NaN Gd 1 PConc 2 Typ 556.0 2.0 TA Fin TA Attchd 2005.0 1561 0 GasA Ex 1Story 35 1 Ex Lvl Gtl 7313 Inside 60.0 Reg 0 120 RL 246.0 BrkFace NaN 0 8 NridgHt 47 5 9 Y 0 NaN CompShg Hip Normal 277500.0 WD 0 Pave 6 1561.0 AllPub 203 2005 2005 2007
35 1132 1320 0 NaN 4 1Fam TA Av 0.0 0.0 Unf Unf 0.0 0.0 Ex 1117.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 3 Typ 691.0 3.0 TA Fin TA BuiltIn 2004.0 2452 1 GasA Ex 2Story 36 1 Gd Lvl Gtl 13418 Inside 108.0 Reg 0 60 RL 132.0 Stone NaN 0 9 NridgHt 32 5 8 Y 0 NaN CompShg Gable Normal 309000.0 WD 0 Pave 9 1117.0 AllPub 113 2004 2005 2006
36 1097 0 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Gd 1097.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd NaN NaN 0 PConc 1 Typ 672.0 2.0 TA Unf TA Attchd 1995.0 1097 1 GasA Ex 1Story 37 1 TA Lvl Gtl 10859 Corner 112.0 Reg 0 20 RL 0.0 None NaN 0 6 CollgCr 64 5 5 Y 0 NaN CompShg Gable Normal 145000.0 WD 0 Pave 6 1097.0 AllPub 392 1994 1995 2009
40 1324 0 0 NaN 3 1Fam TA No 643.0 0.0 Rec Unf 0.0 0.0 TA 445.0 Y Norm Norm SBrkr 0 TA TA Wd Sdng Wd Sdng GdWo TA 1 CBlock 2 Typ 440.0 2.0 TA RFn TA Attchd 1965.0 1324 0 GasA Ex 1Story 41 1 TA Lvl Gtl 8658 Inside 84.0 Reg 0 20 RL 101.0 BrkFace NaN 0 12 NAmes 138 5 6 Y 0 NaN CompShg Gable Abnorml 160000.0 WD 0 Pave 6 1088.0 AllPub 0 1965 1965 2006
45 1752 0 0 NaN 2 TwnhsE TA No 456.0 0.0 GLQ Unf 1.0 0.0 Ex 1296.0 Y Norm Norm SBrkr 0 TA Ex MetalSd MetalSd NaN Gd 1 PConc 2 Typ 576.0 2.0 TA RFn TA Attchd 2005.0 1752 0 GasA Ex 1Story 46 1 Ex Lvl Gtl 7658 Inside 61.0 Reg 0 120 RL 412.0 BrkFace NaN 0 2 NridgHt 82 5 9 Y 0 NaN CompShg Hip Normal 319900.0 WD 0 Pave 6 1752.0 AllPub 196 2005 2005 2010
46 1518 631 0 NaN 1 1Fam TA No 1351.0 0.0 GLQ Unf 1.0 0.0 Ex 83.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Ex 1 PConc 1 Typ 670.0 2.0 TA RFn TA Attchd 2003.0 2149 1 GasA Ex 1.5Fin 47 1 Gd Lvl Gtl 12822 CulDSac 48.0 IR1 0 50 RL 0.0 None NaN 0 8 Mitchel 43 5 7 Y 0 NaN CompShg Gable Abnorml 239686.0 WD 198 Pave 6 1434.0 AllPub 168 2003 2003 2009
47 1656 0 0 NaN 3 1Fam TA Av 24.0 0.0 GLQ Unf 0.0 0.0 Gd 1632.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Typ 826.0 3.0 TA RFn TA Attchd 2006.0 1656 0 GasA Ex 1Story 48 1 Gd Lvl Gtl 11096 Inside 84.0 Reg 0 20 FV 0.0 None NaN 0 7 Somerst 146 5 8 Y 0 NaN CompShg Gable Normal 249700.0 WD 0 Pave 7 1656.0 AllPub 0 2006 2006 2007
48 736 716 0 NaN 2 2fmCon TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 736.0 Y Norm Norm SBrkr 102 TA TA MetalSd MetalSd NaN NaN 0 BrkTil 2 Typ 0.0 0.0 NaN NaN NaN NaN NaN 1452 0 GasA Gd 2Story 49 3 TA Lvl Gtl 4456 Inside 33.0 Reg 0 190 RM 0.0 None NaN 0 6 OldTown 0 5 4 N 0 NaN CompShg Gable Partial 113000.0 New 0 Pave 8 736.0 AllPub 0 1920 2008 2009
52 816 0 0 NaN 2 Duplex TA Gd 104.0 712.0 LwQ GLQ 1.0 0.0 Gd 0.0 N RRNn Norm SBrkr 0 TA Fa Wd Sdng Wd Sdng NaN NaN 0 CBlock 1 Typ 516.0 2.0 TA Unf TA CarPort 1963.0 816 0 GasA TA 1Story 53 1 TA Bnk Mod 8472 Corner 110.0 IR2 0 90 RM 0.0 None NaN 0 5 IDOTRR 0 5 5 Y 0 NaN CompShg Gable Normal 110000.0 WD 0 Grvl 5 816.0 AllPub 106 1963 1963 2010
53 1842 0 0 NaN 0 1Fam TA Gd 1810.0 0.0 GLQ Unf 2.0 0.0 Ex 32.0 Y Norm Norm SBrkr 0 TA Gd WdShing Wd Shng NaN Gd 1 CBlock 0 Typ 894.0 3.0 TA Fin TA Attchd 1981.0 1842 1 GasA Gd 1Story 54 1 Gd Low Gtl 50271 Inside 68.0 IR1 0 20 RL 0.0 None NaN 0 11 Veenker 72 5 9 Y 0 NaN WdShngl Gable Normal 385000.0 WD 0 Pave 5 1842.0 AllPub 857 1981 1987 2006
54 1360 0 0 NaN 3 1Fam TA No 384.0 0.0 ALQ Unf 0.0 0.0 TA 0.0 Y Norm Norm SBrkr 0 TA TA MetalSd MetalSd MnPrv TA 1 CBlock 1 Min1 572.0 2.0 TA Unf TA Detchd 1962.0 1360 0 GasA TA SLvl 55 1 TA Bnk Mod 7134 Inside 60.0 Reg 0 80 RL 0.0 None NaN 0 2 NAmes 50 5 5 Y 0 NaN CompShg Gable Normal 130000.0 WD 0 Pave 6 384.0 AllPub 0 1955 1955 2007
55 1425 0 407 NaN 3 1Fam TA No 490.0 0.0 BLQ Unf 0.0 0.0 TA 935.0 Y Norm Norm SBrkr 0 TA TA HdBoard Plywood NaN Gd 1 CBlock 2 Typ 576.0 2.0 TA RFn TA Attchd 1964.0 1425 0 GasA Gd 1Story 56 1 TA Lvl Gtl 10175 Inside 100.0 IR1 0 20 RL 272.0 BrkFace NaN 0 7 NAmes 0 5 6 Y 0 NaN CompShg Gable Normal 180500.0 WD 0 Pave 7 1425.0 AllPub 0 1964 1964 2008
56 983 756 0 Pave 3 Twnhs TA No 649.0 0.0 GLQ Unf 1.0 0.0 Gd 321.0 Y Norm Norm SBrkr 0 TA Gd MetalSd MetalSd NaN NaN 0 PConc 2 Typ 480.0 2.0 TA Fin TA Attchd 1999.0 1739 1 GasA Ex 2Story 57 1 Gd Lvl Gtl 2645 Inside 24.0 Reg 0 160 FV 456.0 BrkFace NaN 0 8 Somerst 0 5 8 Y 0 NaN CompShg Gable Abnorml 172500.0 WD 0 Pave 7 970.0 AllPub 115 1999 2000 2009
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2848 1065 984 0 NaN 4 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Gd 1065.0 Y Feedr Norm SBrkr 0 TA Gd VinylSd VinylSd NaN TA 1 PConc 2 Typ 467.0 2.0 TA Unf TA Attchd 1997.0 2049 1 GasA Ex 2Story 2851 1 Gd Lvl Gtl 21533 FR2 NaN IR2 0 60 RL 0.0 None NaN 0 8 CollgCr 48 5 7 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 9 1065.0 AllPub 120 1996 1997 2006
2849 1070 869 0 NaN 3 1Fam TA Mn 796.0 0.0 ALQ Unf 0.0 1.0 Gd 258.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd NaN TA 1 PConc 2 Typ 555.0 3.0 TA RFn TA Attchd 1998.0 1939 1 GasA Ex 2Story 2852 1 Gd Lvl Gtl 11250 Corner 90.0 IR1 0 60 RL 227.0 BrkFace NaN 0 5 CollgCr 84 5 7 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 8 1054.0 AllPub 128 1998 1998 2006
2851 848 0 0 NaN 1 TwnhsE TA Av 717.0 0.0 GLQ Unf 1.0 0.0 Gd 131.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 1 Typ 420.0 2.0 TA Fin TA Attchd 2003.0 848 0 GasA Ex 1Story 2854 1 Gd Lvl Gtl 4435 Inside 37.0 Reg 0 120 RM 170.0 BrkFace NaN 0 4 CollgCr 0 5 6 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 4 848.0 AllPub 140 2003 2003 2006
2852 1390 0 0 NaN 3 1Fam TA No 1000.0 0.0 GLQ Unf 1.0 0.0 Gd 390.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Typ 545.0 2.0 TA RFn TA Attchd 2003.0 1390 0 GasA Ex 1Story 2855 1 Gd Lvl Gtl 8810 Inside 70.0 Reg 0 20 RL 0.0 None NaN 0 3 CollgCr 68 5 7 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 7 1390.0 AllPub 0 2003 2003 2006
2854 784 827 0 NaN 3 1Fam TA Av 0.0 0.0 Unf Unf 0.0 0.0 Gd 784.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Typ 572.0 2.0 TA RFn TA Attchd 2005.0 1611 1 GasA Ex 2Story 2857 1 Gd Lvl Gtl 8400 Inside 70.0 Reg 0 60 RL 0.0 None NaN 0 3 CollgCr 36 5 7 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 784.0 AllPub 144 2005 2005 2006
2855 1336 0 0 NaN 3 1Fam TA Mn 996.0 0.0 GLQ Unf 1.0 0.0 Gd 340.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Typ 502.0 2.0 TA Unf TA Attchd 2005.0 1336 0 GasA Ex 1Story 2858 1 Gd Lvl Gtl 8772 FR2 65.0 Reg 0 20 RL 0.0 None NaN 0 9 CollgCr 43 5 7 Y 0 NaN CompShg Gable Partial NaN New 0 Pave 6 1336.0 AllPub 136 2005 2006 2006
2857 1012 0 0 NaN 4 Duplex TA Gd 976.0 0.0 GLQ Unf 0.0 2.0 Gd 0.0 Y Norm Norm SBrkr 0 TA TA Plywood Wd Shng NaN NaN 0 CBlock 2 Typ 0.0 0.0 NaN NaN NaN NaN NaN 1012 0 GasA TA SFoyer 2860 0 TA Lvl Gtl 7840 CulDSac 38.0 IR1 0 90 RL 355.0 BrkFace NaN 0 10 Edwards 0 5 6 Y 0 NaN Tar&Grv Flat AdjLand NaN WD 0 Pave 4 976.0 AllPub 0 1975 1975 2006
2859 806 918 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Gd 796.0 Y Norm Norm SBrkr 0 TA Gd HdBoard Stucco NaN Gd 1 PConc 2 Typ 616.0 2.0 TA Fin TA BuiltIn 2003.0 1724 1 GasA Ex 2Story 2862 1 Gd Lvl Gtl 7162 Inside 62.0 Reg 0 60 RL 190.0 BrkFace NaN 0 5 Edwards 57 5 7 Y 0 NaN CompShg Hip Normal NaN WD 0 Pave 8 796.0 AllPub 168 2003 2004 2006
2860 914 0 0 NaN 2 1Fam TA Av 475.0 297.0 GLQ ALQ 1.0 0.0 Gd 142.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd NaN NaN 0 PConc 1 Typ 0.0 0.0 NaN NaN NaN NaN NaN 914 0 GasA Ex 1Story 2863 1 Gd Lvl Gtl 8050 Inside 75.0 Reg 0 20 RL NaN NaN NaN 0 4 Edwards 0 5 6 N 0 NaN CompShg Gable Normal NaN WD 0 Pave 4 914.0 AllPub 32 2002 2002 2006
2861 1164 1150 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Ex 1150.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Ex 1 PConc 2 Typ 502.0 2.0 TA Fin TA BuiltIn 2003.0 2314 1 GasA Ex 2Story 2864 1 Gd Lvl Gtl 11060 Corner 90.0 IR1 0 60 RL 0.0 None NaN 0 2 Edwards 274 5 7 Y 0 NaN CompShg Gable Normal NaN ConLD 0 Pave 9 1150.0 AllPub 0 2003 2005 2006
2862 1072 0 0 NaN 2 TwnhsE TA Gd 547.0 0.0 GLQ Unf 1.0 0.0 Gd 0.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd NaN NaN 0 PConc 1 Typ 525.0 2.0 TA Fin TA Basment 2005.0 1072 0 GasA Gd SFoyer 2865 1 TA Lvl Gtl 3675 Inside 35.0 Reg 0 180 RM 82.0 BrkFace NaN 0 10 Edwards 44 5 6 Y 0 NaN CompShg Gable Partial NaN New 0 Pave 5 547.0 AllPub 0 2005 2006 2006
2863 970 739 0 NaN 3 Twnhs TA No 0.0 0.0 Unf Unf 0.0 0.0 Gd 970.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Maj1 380.0 2.0 TA Unf TA Detchd 2004.0 1709 0 GasA Ex 2Story 2866 1 Gd Lvl Gtl 2522 Inside 24.0 Reg 0 160 RM 50.0 Stone NaN 0 5 Edwards 40 5 7 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 7 970.0 AllPub 0 2004 2004 2006
2866 1093 576 0 NaN 4 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 1093.0 N Feedr Norm FuseF 56 TA TA Wd Sdng Wd Sdng NaN NaN 0 BrkTil 1 Min2 288.0 1.0 TA Unf Fa Attchd 1924.0 1669 1 GasA TA 1.5Fin 2869 1 TA Lvl Gtl 8707 FR2 62.0 Reg 0 50 RL 0.0 None NaN 0 5 Edwards 0 5 4 Y 0 NaN CompShg Gable AdjLand NaN WD 0 Pave 9 1093.0 AllPub 0 1924 1950 2006
2871 1058 493 0 NaN 3 1Fam TA No 930.0 0.0 LwQ Unf 1.0 0.0 Fa 128.0 Y Norm Norm SBrkr 0 TA TA MetalSd MetalSd MnPrv NaN 0 BrkTil 2 Typ 240.0 1.0 TA Unf Fa Detchd 1938.0 1551 0 GasA TA 1.5Fin 2874 1 Fa Lvl Gtl 10890 Inside 60.0 Reg 0 50 RL 0.0 None NaN 0 7 SWISU 0 5 5 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 1058.0 AllPub 0 1938 1950 2006
2888 967 671 0 NaN 4 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 967.0 Y Norm Norm SBrkr 0 TA TA MetalSd MetalSd NaN NaN 0 CBlock 2 Typ 384.0 1.0 TA Unf TA Detchd 1957.0 1638 0 GasA Gd 1.5Fin 2891 1 Gd Lvl Gtl 9060 Inside 75.0 Reg 0 50 RM 327.0 BrkFace NaN 0 4 IDOTRR 21 5 6 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 967.0 AllPub 0 1957 1957 2006
2892 1778 0 0 NaN 2 TwnhsE TA Gd 1573.0 0.0 GLQ Unf 2.0 0.0 Ex 0.0 Y Norm Norm SBrkr 0 TA Gd CemntBd CmentBd NaN Gd 1 PConc 2 Typ 495.0 2.0 TA Fin TA Attchd 2005.0 1778 0 GasA Ex 1Story 2895 1 Ex HLS Mod 5748 Inside 41.0 IR1 0 120 RM 473.0 Stone NaN 0 2 Crawfor 53 5 8 Y 0 NaN CompShg Hip Partial NaN New 153 Pave 5 1573.0 AllPub 123 2005 2006 2006
2893 1646 0 0 NaN 2 TwnhsE TA Gd 1564.0 0.0 GLQ Unf 1.0 1.0 Ex 30.0 Y Norm Norm SBrkr 0 TA Gd CemntBd CmentBd NaN Gd 1 PConc 2 Typ 525.0 2.0 TA Fin TA Attchd 2004.0 1646 0 GasA Ex 1Story 2896 1 Gd HLS Mod 3842 Inside 44.0 IR1 0 120 RM 186.0 Stone NaN 0 12 Crawfor 53 5 8 Y 0 NaN CompShg Hip Normal NaN WD 155 Pave 5 1594.0 AllPub 128 2004 2005 2006
2895 1664 0 0 NaN 4 Duplex TA Mn 0.0 0.0 Unf Unf 0.0 0.0 TA 1664.0 Y Norm Norm SBrkr 0 TA TA Plywood Plywood NaN NaN 0 CBlock 2 Typ 616.0 2.0 TA Unf TA 2Types 1978.0 1664 0 GasA TA 1Story 2898 2 TA Lvl Gtl 8385 Inside 65.0 Reg 0 90 RL 0.0 None NaN 0 10 Mitchel 0 5 6 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 10 1664.0 AllPub 0 1978 1978 2006
2896 1491 0 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Ex 1491.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 PConc 2 Typ 490.0 2.0 TA RFn TA Attchd 2001.0 1491 0 GasA Ex 1Story 2899 1 Gd Lvl Gtl 9116 Corner 70.0 Reg 0 20 RL 0.0 None NaN 0 5 Mitchel 100 5 8 Y 0 NaN CompShg Hip Normal NaN WD 0 Pave 7 1491.0 AllPub 120 2001 2001 2006
2898 1650 0 0 NaN 2 1Fam TA Gd 909.0 0.0 BLQ Unf 1.0 0.0 Gd 723.0 Y Norm Norm SBrkr 0 TA TA Plywood Plywood NaN Gd 2 CBlock 1 Typ 518.0 2.0 TA Unf TA Attchd 1958.0 1650 0 GasA TA 1Story 2901 1 TA Low Mod 50102 Inside NaN IR1 0 20 RL 0.0 None NaN 0 3 Timber 0 5 6 Y 0 NaN Tar&Grv Gable Alloca NaN WD 138 Pave 6 1632.0 AllPub 0 1958 1958 2006
2899 1403 0 0 NaN 2 1Fam TA Av 1136.0 116.0 GLQ BLQ 1.0 0.0 Gd 129.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 Wood 2 Typ 470.0 2.0 TA Unf TA Attchd 2000.0 1403 0 GasA Ex 1Story 2902 1 Gd Lvl Gtl 8098 Inside NaN IR1 0 20 RL 0.0 None NaN 0 10 Timber 173 5 6 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 5 1381.0 AllPub 0 2000 2000 2006
2901 1838 0 0 NaN 3 1Fam TA Gd 1455.0 0.0 GLQ Unf 1.0 0.0 Gd 383.0 Y Norm Norm SBrkr 0 TA Ex VinylSd VinylSd NaN Gd 1 PConc 2 Typ 682.0 3.0 TA Fin TA Attchd 2005.0 1838 0 GasA Ex 1Story 2904 1 Ex Lvl Gtl 11577 Inside 88.0 Reg 0 20 RL 382.0 BrkFace NaN 0 9 Timber 225 5 9 Y 0 NaN CompShg Hip Partial NaN New 0 Pave 9 1838.0 AllPub 161 2005 2006 2006
2903 1368 0 0 NaN 2 Duplex TA Gd 1243.0 0.0 GLQ Unf 2.0 0.0 Gd 45.0 Y Norm Norm SBrkr 0 Gd TA MetalSd MetalSd NaN NaN 0 PConc 2 Typ 784.0 4.0 TA Fin TA Attchd 1997.0 1368 0 GasA Gd SFoyer 2906 2 TA Lvl Gtl 7020 Inside 78.0 Reg 0 90 RM 200.0 BrkFace NaN 0 11 Mitchel 48 5 7 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 8 1288.0 AllPub 0 1997 1997 2006
2906 1652 0 0 NaN 4 Duplex TA No 149.0 0.0 BLQ Unf 0.0 0.0 TA 1503.0 Y Norm Norm SBrkr 0 TA TA Plywood Plywood NaN NaN 0 CBlock 2 Typ 928.0 3.0 TA Unf TA 2Types 1970.0 1652 0 GasA TA 1Story 2909 2 TA Lvl Gtl 11836 Corner NaN IR1 0 90 RL 0.0 None NaN 0 3 Mitchel 0 5 5 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 8 1652.0 AllPub 0 1970 1970 2006
2909 1360 0 0 NaN 3 1Fam TA Av 119.0 344.0 Rec BLQ 1.0 0.0 TA 641.0 Y Norm Norm SBrkr 0 TA TA Plywood Plywood NaN TA 1 PConc 1 Typ 336.0 1.0 TA RFn TA Attchd 1969.0 1360 0 GasA Fa 1Story 2912 1 TA Lvl Mod 13384 Inside 80.0 Reg 0 20 RL 194.0 BrkFace NaN 0 5 Mitchel 0 5 5 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 8 1104.0 AllPub 160 1969 1979 2006
2910 546 546 0 NaN 3 Twnhs TA No 408.0 0.0 Rec Unf 0.0 0.0 TA 138.0 Y Norm Norm SBrkr 0 TA TA CemntBd CmentBd NaN NaN 0 CBlock 1 Typ 286.0 1.0 TA Unf TA CarPort 1970.0 1092 1 GasA TA 2Story 2913 1 TA Lvl Gtl 1533 Inside 21.0 Reg 0 160 RM 0.0 None NaN 0 12 MeadowV 0 5 4 Y 0 NaN CompShg Gable Abnorml NaN WD 0 Pave 5 546.0 AllPub 0 1970 1970 2006
2911 546 546 0 NaN 3 Twnhs TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 546.0 Y Norm Norm SBrkr 0 TA TA CemntBd CmentBd GdPrv NaN 0 CBlock 1 Typ 0.0 0.0 NaN NaN NaN NaN NaN 1092 1 GasA TA 2Story 2914 1 TA Lvl Gtl 1526 Inside 21.0 Reg 0 160 RM 0.0 None NaN 0 6 MeadowV 34 5 4 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 5 546.0 AllPub 0 1970 1970 2006
2913 546 546 0 NaN 3 TwnhsE TA No 252.0 0.0 Rec Unf 0.0 0.0 TA 294.0 Y Norm Norm SBrkr 0 TA TA CemntBd CmentBd NaN NaN 0 CBlock 1 Typ 286.0 1.0 TA Unf TA CarPort 1970.0 1092 1 GasA TA 2Story 2916 1 TA Lvl Gtl 1894 Inside 21.0 Reg 0 160 RM 0.0 None NaN 0 4 MeadowV 24 5 4 Y 0 NaN CompShg Gable Abnorml NaN WD 0 Pave 6 546.0 AllPub 0 1970 1970 2006
2915 970 0 0 NaN 3 1Fam TA Av 337.0 0.0 GLQ Unf 0.0 1.0 Gd 575.0 Y Norm Norm SBrkr 0 TA TA HdBoard Wd Shng MnPrv NaN 0 PConc 1 Typ 0.0 0.0 NaN NaN NaN NaN NaN 970 0 GasA TA SFoyer 2918 1 TA Lvl Gtl 10441 Inside 62.0 Reg 0 85 RL 0.0 None Shed 700 7 Mitchel 32 5 5 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 912.0 AllPub 80 1992 1992 2006
2916 996 1004 0 NaN 3 1Fam TA Av 758.0 0.0 LwQ Unf 0.0 0.0 Gd 238.0 Y Norm Norm SBrkr 0 TA TA HdBoard HdBoard NaN TA 1 PConc 2 Typ 650.0 3.0 TA Fin TA Attchd 1993.0 2000 1 GasA Ex 2Story 2919 1 TA Lvl Mod 9627 Inside 74.0 Reg 0 60 RL 94.0 BrkFace NaN 0 11 Mitchel 48 5 7 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 9 996.0 AllPub 190 1993 1994 2006

1504 rows × 81 columns

I see a lot of blue, going with TA

In [174]:
data['BsmtCond'].loc[(data['BsmtCond'].isnull()) & ((data.BsmtFinSF1 ==1044 ) | (data['BsmtFinSF1']==1033) | (data['BsmtFinSF1'] ==755))] = 'TA'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

BsmtExposure

In [175]:
data.loc[(data.TotalBsmtSF != 0) & (data.BsmtExposure.isnull())]
Out[175]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
947 936 840 0 NaN 3 1Fam TA NaN 0.0 0.0 Unf Unf 0.0 0.0 Gd 936.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN TA 1 PConc 2 Typ 474.0 2.0 TA RFn TA Attchd 2002.0 1776 1 GasA Ex 2Story 949 1 Gd Lvl Gtl 14006 Inside 65.0 IR1 0 60 RL 144.0 BrkFace NaN 0 2 CollgCr 96 5 7 Y 0 NaN CompShg Gable Normal 192500.0 WD 0 Pave 7 936.0 AllPub 144 2002 2002 2006
1485 1595 0 0 NaN 2 1Fam TA NaN 0.0 0.0 Unf Unf 0.0 0.0 Gd 1595.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 880.0 3.0 TA RFn TA Attchd 2005.0 1595 0 GasA Ex 1Story 1488 1 Gd Lvl Gtl 8987 Inside 73.0 Reg 0 20 RL 226.0 BrkFace NaN 0 5 Somerst 0 5 8 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 1595.0 AllPub 144 2005 2006 2010
2118 896 0 0 NaN 2 1Fam NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Y Feedr Norm FuseA 0 TA TA MetalSd CBlock MnPrv NaN 0 PConc 1 Typ 280.0 1.0 TA Unf TA Detchd 1946.0 896 0 GasA TA 1Story 2121 1 TA Lvl Gtl 5940 FR3 99.0 IR1 0 20 RM 0.0 None NaN 0 4 BrkSide 0 7 4 Y 0 NaN CompShg Gable Abnorml NaN ConLD 0 Pave 4 NaN AllPub 0 1946 1950 2008
2346 725 863 0 NaN 3 1Fam TA NaN 0.0 0.0 Unf Unf 0.0 0.0 Gd 725.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN NaN 0 CBlock 3 Typ 561.0 2.0 TA Unf TA Attchd 2007.0 1588 0 GasA Ex 2Story 2349 1 Gd Lvl Gtl 10411 Corner 81.0 Reg 0 60 FV 0.0 None NaN 0 7 Somerst 0 5 5 Y 0 NaN CompShg Gable Partial NaN New 0 Pave 8 725.0 AllPub 0 2007 2007 2007
In [176]:
data.loc[(data['BsmtExposure'].isnull()) & ((data.index==948 ) | (data.index==1487) | (data.index ==2348))] 
Out[176]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold

Going to compare it to BsmtCond

In [177]:
plt.figure(figsize=(16,18))
g = sns.relplot(x="TotalBsmtSF", y="BsmtCond", hue="BsmtExposure",
            sizes=(40, 400), alpha=.5, palette="muted",
            height=6, data=data)
g.set(xlim=(0, 2500))
Out[177]:
<seaborn.axisgrid.FacetGrid at 0x279d9a03ef0>
<Figure size 1600x1800 with 0 Axes>

Too many values to see anything. Going to try overallcond since they all have a 5 cond

In [178]:
# change height and use xlim if you need a closer look
g = sns.lmplot( x="TotalBsmtSF", y="OverallCond", data=data, fit_reg=False, hue='BsmtExposure', scatter_kws={"s": 50},height=5)

I'm seeing a lot of red at 5, so I will impute with Av

In [179]:
data['BsmtExposure'].loc[(data['BsmtExposure'].isnull()) & ((data.index==948 ) | (data.index==1487) | (data.index ==2348))] ='Av'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

BsmtFinType2

In [180]:
data.loc[(data['BsmtFinType2'].isnull()) & data['BsmtFinType1'].notnull()]
Out[180]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
332 1629 0 0 NaN 3 1Fam TA No 1124.0 479.0 GLQ NaN 1.0 0.0 Gd 1603.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 880.0 3.0 TA RFn TA Attchd 2003.0 1629 0 GasA Ex 1Story 333 1 Gd Lvl Gtl 10655 Inside 85.0 IR1 0 20 RL 296.0 BrkFace NaN 0 10 NridgHt 0 5 8 Y 0 NaN CompShg Gable Normal 284000.0 WD 0 Pave 7 3206.0 AllPub 0 2003 2004 2009
In [181]:
data.loc[(data.BsmtFinType2.isnull()) & (data.index ==332)]
Out[181]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
332 1629 0 0 NaN 3 1Fam TA No 1124.0 479.0 GLQ NaN 1.0 0.0 Gd 1603.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd NaN Gd 1 PConc 2 Typ 880.0 3.0 TA RFn TA Attchd 2003.0 1629 0 GasA Ex 1Story 333 1 Gd Lvl Gtl 10655 Inside 85.0 IR1 0 20 RL 296.0 BrkFace NaN 0 10 NridgHt 0 5 8 Y 0 NaN CompShg Gable Normal 284000.0 WD 0 Pave 7 3206.0 AllPub 0 2003 2004 2009
In [182]:
g = sns.relplot(x="TotalBsmtSF", y="BsmtQual", hue="BsmtFinType2",
            sizes=(40, 400), alpha=.5, palette="muted",
            height=6, data=data)
g.set(xlim=(0, 2500))
Out[182]:
<seaborn.axisgrid.FacetGrid at 0x279e1d54d68>

Clearly Unf for this data

In [183]:
data['BsmtFinType2'].loc[(data.BsmtFinType2.isnull()) & (data.index ==332)] = 'Unf'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

BsmtFullBath & BsmtHalfBath

In [184]:
data.loc[(data.BsmtFullBath.isnull()) & (data.BsmtHalfBath.isnull())]
Out[184]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2118 896 0 0 NaN 2 1Fam NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Y Feedr Norm FuseA 0 TA TA MetalSd CBlock MnPrv NaN 0 PConc 1 Typ 280.0 1.0 TA Unf TA Detchd 1946.0 896 0 GasA TA 1Story 2121 1 TA Lvl Gtl 5940 FR3 99.0 IR1 0 20 RM 0.0 None NaN 0 4 BrkSide 0 7 4 Y 0 NaN CompShg Gable Abnorml NaN ConLD 0 Pave 4 NaN AllPub 0 1946 1950 2008
2186 3820 0 0 NaN 5 1Fam NaN NaN 0.0 0.0 NaN NaN NaN NaN NaN 0.0 Y Norm Norm SBrkr 0 TA TA Plywood Plywood NaN Gd 2 Slab 3 Typ 624.0 2.0 TA Unf TA Attchd 1959.0 3820 1 GasA TA 1Story 2189 1 Ex Lvl Gtl 47007 Inside 123.0 IR1 0 20 RL 0.0 None NaN 0 7 Edwards 372 7 5 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 11 0.0 AllPub 0 1959 1996 2008

No need to make any changes to these specifically, theyre good for the mass fill

BsmtQual

In [185]:
data.loc[(data.BsmtQual.isnull()) & (data.BsmtExposure.notnull())]
Out[185]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2215 825 536 0 NaN 2 1Fam Fa No 0.0 0.0 Unf Unf 0.0 0.0 NaN 173.0 N Feedr Norm SBrkr 0 TA TA Wd Sdng Wd Sdng NaN NaN 0 Stone 1 Typ 185.0 1.0 TA Unf Fa Detchd 1895.0 1361 0 GasA Ex 2Story 2218 1 TA Lvl Gtl 5280 Corner 60.0 Reg 0 70 C (all) 0.0 None NaN 0 7 IDOTRR 123 7 4 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 173.0 AllPub 0 1895 1950 2008
2216 671 378 0 NaN 2 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 NaN 356.0 N Feedr Norm FuseA 0 TA TA Plywood Plywood NaN NaN 0 PConc 1 Typ 195.0 1.0 Fa Unf Po Detchd 1910.0 1049 0 GasA TA 1.5Fin 2219 1 TA Lvl Gtl 5150 Corner 52.0 Reg 0 50 C (all) 0.0 None NaN 0 5 IDOTRR 0 7 4 N 0 NaN CompShg Gable Normal NaN WD 0 Pave 5 356.0 AllPub 0 1910 2000 2008
In [186]:
g = sns.relplot(x="TotalBsmtSF", y="BsmtExposure", hue="BsmtQual",
            sizes=(40, 400), alpha=.5, palette="muted",
            height=6, data=data)
g.set(xlim=(100, 700))
Out[186]:
<seaborn.axisgrid.FacetGrid at 0x279dcf085f8>

I see mostly Orange so going with TA

In [187]:
data['BsmtQual'].loc[(data.BsmtQual.isnull()) & ((data.index ==2217) | (data.index==2218))] = 'TA'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [188]:
data.isnull().sum()
Out[188]:
1stFlrSF            0
2ndFlrSF            0
3SsnPorch           0
Alley            2719
BedroomAbvGr        0
BldgType            0
BsmtCond           79
BsmtExposure       82
BsmtFinSF1          1
BsmtFinSF2          1
BsmtFinType1       79
BsmtFinType2       79
BsmtFullBath        2
BsmtHalfBath        2
BsmtQual           81
BsmtUnfSF           1
CentralAir          0
Condition1          0
Condition2          0
Electrical          1
EnclosedPorch       0
ExterCond           0
ExterQual           0
Exterior1st         1
Exterior2nd         1
Fence            2346
FireplaceQu      1420
Fireplaces          0
Foundation          0
FullBath            0
                 ... 
LotShape            0
LowQualFinSF        0
MSSubClass          0
MSZoning            4
MasVnrArea         23
MasVnrType         24
MiscFeature      2812
MiscVal             0
MoSold              0
Neighborhood        0
OpenPorchSF         0
OverallCond         0
OverallQual         0
PavedDrive          0
PoolArea            0
PoolQC           2905
RoofMatl            0
RoofStyle           0
SaleCondition       0
SalePrice        1459
SaleType            1
ScreenPorch         0
Street              0
TotRmsAbvGrd        0
TotalBsmtSF         1
Utilities           2
WoodDeckSF          0
YearBuilt           0
YearRemodAdd        0
YrSold              0
Length: 81, dtype: int64

Done with Basements!

Electrical

In [189]:
data.loc[data.Electrical.isnull()]
Out[189]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
1377 754 640 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 Gd 384.0 Y Norm Norm NaN 0 TA TA VinylSd VinylSd NaN NaN 0 PConc 2 Typ 400.0 2.0 TA Fin TA BuiltIn 2007.0 1394 1 GasA Gd SLvl 1380 1 Gd Lvl Gtl 9735 Inside 73.0 Reg 0 80 RL 0.0 None NaN 0 5 Timber 0 5 5 Y 0 NaN CompShg Gable Normal 167500.0 WD 0 Pave 7 384.0 AllPub 100 2006 2007 2008
In [190]:
# going to impute with most common, also since it was built in 2006, SBrkr is most likely what it is
data['Electrical'].loc[(data.index==1379)] = 'SBrkr'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Exterior1st and 2nd

In [191]:
data.loc[(data['Exterior1st'].isnull()) & (data['Exterior2nd'].isnull())]
Out[191]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2149 1518 0 0 NaN 2 1Fam TA Gd 1035.0 0.0 ALQ Unf 1.0 0.0 TA 545.0 Y Norm Norm SBrkr 0 TA TA NaN NaN NaN Gd 2 PConc 1 Typ 0.0 0.0 NaN NaN NaN NaN NaN 1518 0 GasA Ex 1Story 2152 1 Fa Lvl Gtl 19550 Inside 85.0 Reg 0 30 RL 0.0 None NaN 0 1 Edwards 39 7 5 Y 0 NaN Tar&Grv Flat Normal NaN WD 0 Pave 5 1580.0 AllPub 0 1940 2007 2008
In [192]:
#data.loc[(data['Exterior1st'] == data['Exterior2nd'])]
# this shows 2400+ rows where the 1st and 2nd exterior are the same
In [193]:
data.Exterior2nd.value_counts()
Out[193]:
VinylSd    1014
MetalSd     447
HdBoard     406
Wd Sdng     391
Plywood     270
CmentBd     125
Wd Shng      81
BrkFace      47
Stucco       46
AsbShng      38
Brk Cmn      22
ImStucc      15
Stone         6
AsphShn       4
CBlock        3
Other         1
Name: Exterior2nd, dtype: int64
In [194]:
# change height and use xlim if you need a closer look
plt.figure(figsize=(16,8))
g = sns.boxplot( x="Exterior2nd", y="YearBuilt", data=data)

Kind of all over the place but since they did do a remodel I'm going with Vinyl Siding which is the most common

In [195]:
data['Exterior1st'].loc[(data.index==2151)] ='VinylSd'
data['Exterior2nd'].loc[(data.index==2151)] ='VinylSd'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Functional

In [196]:
data.loc[(data.Functional.isnull())]
Out[196]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2214 733 0 0 NaN 2 1Fam NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN 0.0 N Norm Norm FuseA 0 Po Fa AsbShng VinylSd NaN NaN 0 Slab 1 NaN 487.0 2.0 Po Unf Fa Attchd 1952.0 733 0 Wall Po 1Story 2217 1 Fa Low Mod 14584 Inside 80.0 Reg 0 20 NaN 0.0 None NaN 0 2 IDOTRR 0 5 1 N 0 NaN CompShg Gable Abnorml NaN WD 0 Pave 4 0.0 AllPub 0 1952 1952 2008
2471 866 504 0 Grvl 3 1Fam Fa No 0.0 0.0 Unf Unf 0.0 0.0 TA 771.0 Y Artery Norm SBrkr 0 Fa Fa Wd Sdng Wd Sdng NaN NaN 0 CBlock 2 NaN 264.0 1.0 Fa Unf TA Detchd 1910.0 1484 0 GasA Fa 1.5Fin 2474 1 TA Lvl Gtl 10320 Corner 60.0 Reg 114 50 RM 0.0 None NaN 0 9 IDOTRR 211 1 4 N 0 NaN CompShg Gable Abnorml NaN COD 84 Pave 6 771.0 AllPub 14 1910 1950 2007
In [197]:
# Documentation says assume typical, so we assume typical
data['Functional'].loc[(data.index==2216) | (data.index==2473)] = 'Typ'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Garage

In [198]:
garage_list= [col for col in data if 'Garage' in col]
garage_list
Out[198]:
['GarageArea',
 'GarageCars',
 'GarageCond',
 'GarageFinish',
 'GarageQual',
 'GarageType',
 'GarageYrBlt']
In [199]:
garage_data = data[garage_list]
garage_data.isnull().sum()
Out[199]:
GarageArea        1
GarageCars        1
GarageCond      159
GarageFinish    159
GarageQual      159
GarageType      157
GarageYrBlt     159
dtype: int64
In [200]:
data.loc[(data.GarageType.notnull()) & (data.GarageCond.isnull())]
Out[200]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2124 1242 742 0 Grvl 5 2fmCon TA Mn 196.0 0.0 Rec Unf 0.0 0.0 TA 1046.0 Y Norm Norm SBrkr 180 TA TA Wd Sdng Wd Sdng MnPrv NaN 0 PConc 2 Typ 360.0 1.0 NaN NaN NaN Detchd NaN 1984 0 GasA Gd 2.5Unf 2127 1 TA Lvl Gtl 8094 Inside 57.0 Reg 0 60 RM 0.0 None Shed 1000 9 OldTown 0 8 6 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 8 1242.0 AllPub 64 1910 1983 2008
2574 942 886 0 NaN 3 1Fam TA No 548.0 0.0 ALQ Unf 0.0 0.0 Gd 311.0 Y Norm Norm SBrkr 212 TA TA Wd Sdng Plywood MnPrv NaN 0 BrkTil 2 Typ NaN NaN NaN NaN NaN Detchd NaN 1828 0 GasA Ex 2Story 2577 1 Gd Lvl Gtl 9060 Inside 50.0 Reg 0 70 RM 0.0 None NaN 0 3 IDOTRR 0 6 5 Y 0 NaN CompShg Gable Alloca NaN WD 0 Pave 6 859.0 AllPub 174 1923 1999 2007
In [201]:
# No other information for garage at 2576 so Im going to change detchd to NaN for this row
data['GarageType'].loc[(data.index==2576)] = np.NaN
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [202]:
data.loc[(data.GarageType.notnull()) & (data.GarageCond.isnull())]
Out[202]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2124 1242 742 0 Grvl 5 2fmCon TA Mn 196.0 0.0 Rec Unf 0.0 0.0 TA 1046.0 Y Norm Norm SBrkr 180 TA TA Wd Sdng Wd Sdng MnPrv NaN 0 PConc 2 Typ 360.0 1.0 NaN NaN NaN Detchd NaN 1984 0 GasA Gd 2.5Unf 2127 1 TA Lvl Gtl 8094 Inside 57.0 Reg 0 60 RM 0.0 None Shed 1000 9 OldTown 0 8 6 Y 0 NaN CompShg Gable Normal NaN WD 0 Pave 8 1242.0 AllPub 64 1910 1983 2008
2574 942 886 0 NaN 3 1Fam TA No 548.0 0.0 ALQ Unf 0.0 0.0 Gd 311.0 Y Norm Norm SBrkr 212 TA TA Wd Sdng Plywood MnPrv NaN 0 BrkTil 2 Typ NaN NaN NaN NaN NaN Detchd NaN 1828 0 GasA Ex 2Story 2577 1 Gd Lvl Gtl 9060 Inside 50.0 Reg 0 70 RM 0.0 None NaN 0 3 IDOTRR 0 6 5 Y 0 NaN CompShg Gable Alloca NaN WD 0 Pave 6 859.0 AllPub 174 1923 1999 2007

Need to fill in 4 values here

In [203]:
#data.loc[(data.GarageYrBlt == data.YearBuilt)]
# This showed 2200 rows where garage year built was equal to yearbuilt of the house so I will impute with that
In [204]:
# impute garageyrblt
data['GarageYrBlt'].loc[(data.index == 2126)] = data['YearBuilt'].loc[(data.index ==2126)]
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [205]:
#Impute garage qual with most common value, TA
data['GarageQual'].loc[(data.index == 2126)] = 'TA'
# impute GarageFinish with most common value, 'Unf'
data['GarageFinish'].loc[(data.index==2126)] = 'Unf'
#Impute GarageCond with most common value, TA
data['GarageCond'].loc[(data.index == 2126)] = 'TA'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Done with Garage

KitchenQual

In [206]:
data.loc[(data.KitchenQual.isnull())]
Out[206]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
1553 725 499 0 NaN 3 1Fam Fa No 0.0 0.0 Unf Unf 0.0 0.0 Gd 689.0 N Norm Norm SBrkr 248 TA TA Wd Sdng Wd Sdng NaN NaN 0 BrkTil 1 Mod 180.0 1.0 Fa Unf Fa Detchd 1917.0 1224 1 GasA Gd 1.5Fin 1556 1 NaN Lvl Gtl 10632 Inside 72.0 IR1 0 50 RL 0.0 None NaN 0 1 ClearCr 0 3 5 N 0 NaN CompShg Gable Normal NaN COD 0 Pave 6 689.0 AllPub 0 1917 1950 2010
In [207]:
data.KitchenQual.value_counts()
Out[207]:
TA    1492
Gd    1151
Ex     203
Fa      70
Name: KitchenQual, dtype: int64
In [208]:
# Imputing KitchenQual with most common, TA
data['KitchenQual'].loc[(data.index==1555)] ='TA'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [209]:
data.isnull().sum()
Out[209]:
1stFlrSF            0
2ndFlrSF            0
3SsnPorch           0
Alley            2719
BedroomAbvGr        0
BldgType            0
BsmtCond           79
BsmtExposure       82
BsmtFinSF1          1
BsmtFinSF2          1
BsmtFinType1       79
BsmtFinType2       79
BsmtFullBath        2
BsmtHalfBath        2
BsmtQual           81
BsmtUnfSF           1
CentralAir          0
Condition1          0
Condition2          0
Electrical          1
EnclosedPorch       0
ExterCond           0
ExterQual           0
Exterior1st         1
Exterior2nd         1
Fence            2346
FireplaceQu      1420
Fireplaces          0
Foundation          0
FullBath            0
                 ... 
LotShape            0
LowQualFinSF        0
MSSubClass          0
MSZoning            4
MasVnrArea         23
MasVnrType         24
MiscFeature      2812
MiscVal             0
MoSold              0
Neighborhood        0
OpenPorchSF         0
OverallCond         0
OverallQual         0
PavedDrive          0
PoolArea            0
PoolQC           2905
RoofMatl            0
RoofStyle           0
SaleCondition       0
SalePrice        1459
SaleType            1
ScreenPorch         0
Street              0
TotRmsAbvGrd        0
TotalBsmtSF         1
Utilities           2
WoodDeckSF          0
YearBuilt           0
YearRemodAdd        0
YrSold              0
Length: 81, dtype: int64

MSZoning

  • 4 values missing
In [210]:
data.loc[(data.MSZoning.isnull())]
Out[210]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
1913 810 0 0 NaN 1 1Fam NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN 0.0 N Norm Norm FuseA 0 Fa Fa Wd Sdng Wd Sdng NaN NaN 0 CBlock 1 Min1 280.0 1.0 TA Unf TA Detchd 1975.0 810 0 GasA TA 1Story 1916 1 TA Lvl Gtl 21780 Inside 109.0 Reg 0 30 NaN 0.0 None NaN 0 3 IDOTRR 24 4 2 N 0 NaN CompShg Gable Normal NaN ConLD 0 Grvl 4 0.0 NaN 119 1910 1950 2009
2214 733 0 0 NaN 2 1Fam NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN 0.0 N Norm Norm FuseA 0 Po Fa AsbShng VinylSd NaN NaN 0 Slab 1 NaN 487.0 2.0 Po Unf Fa Attchd 1952.0 733 0 Wall Po 1Story 2217 1 Fa Low Mod 14584 Inside 80.0 Reg 0 20 NaN 0.0 None NaN 0 2 IDOTRR 0 5 1 N 0 NaN CompShg Gable Abnorml NaN WD 0 Pave 4 0.0 AllPub 0 1952 1952 2008
2248 1150 686 0 NaN 4 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 686.0 Y Norm Norm SBrkr 0 TA TA Wd Sdng Wd Sdng NaN NaN 0 BrkTil 2 Maj1 288.0 1.0 Fa Unf TA Detchd 1900.0 1836 0 GasA Ex 2.5Unf 2251 1 TA Low Gtl 56600 Inside NaN IR1 0 70 NaN 0.0 None NaN 0 1 IDOTRR 0 1 5 N 0 NaN CompShg Hip Normal NaN WD 0 Pave 7 686.0 AllPub 0 1900 1950 2008
2902 1600 0 0 NaN 3 1Fam NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN 0.0 Y Artery Norm FuseA 135 Fa TA CBlock VinylSd NaN NaN 0 CBlock 1 Mod 270.0 1.0 TA Unf Fa Attchd 1951.0 1600 1 GasA TA 1Story 2905 1 TA Lvl Gtl 31250 Inside 125.0 Reg 0 20 NaN 0.0 None NaN 0 5 Mitchel 0 3 1 N 0 NaN CompShg Gable Normal NaN WD 0 Pave 6 0.0 AllPub 0 1951 1951 2006
In [211]:
# change height and use xlim if you need a closer look
g = sns.boxplot( x="MSZoning", y="LotArea", data=data)
g.set(ylim=(0,60000))
Out[211]:
[(0, 60000)]

Based on the plot above, and given the LotAreas of the data where MSZoning is NaN, I will impute with RL

In [212]:
data['MSZoning'].loc[(data.index==1915)|(data.index==2216)| (data.index==2250)| (data.index==2904)] = 'RL'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

MasVnrType & MasVnrArea

In [213]:
data.loc[(data.MasVnrType.isnull()) &(data.MasVnrArea.notnull())]
Out[213]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2608 1608 0 0 NaN 3 1Fam TA No 811.0 0.0 BLQ Unf 0.0 0.0 TA 585.0 N Norm Norm SBrkr 0 TA TA Plywood Plywood NaN NaN 0 CBlock 1 Typ 444.0 1.0 Fa Unf TA Attchd 1961.0 1608 0 GasA TA 1Story 2611 1 TA Lvl Gtl 27697 Inside 124.0 Reg 0 20 RL 198.0 NaN NaN 0 11 Mitchel 38 3 4 Y 0 NaN CompShg Shed Abnorml NaN COD 0 Pave 6 1396.0 AllPub 152 1961 1961 2007
In [214]:
data.MasVnrType.value_counts()
Out[214]:
None       1742
BrkFace     879
Stone       247
BrkCmn       25
Name: MasVnrType, dtype: int64
In [215]:
g = sns.boxplot( x="MasVnrType", y="MasVnrArea", data=data)
In [216]:
# Impute with 'BrkFace'
data['MasVnrType'].loc[(data.index==2610)] = 'BrkFace'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

SaleType

In [217]:
data.loc[(data.SaleType.isnull())]
Out[217]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
2487 1176 0 0 NaN 3 1Fam TA Mn 190.0 873.0 Rec BLQ 1.0 0.0 TA 95.0 Y Feedr Norm SBrkr 0 TA TA Plywood Plywood NaN Gd 2 CBlock 1 Typ 303.0 1.0 TA Unf TA Attchd 1958.0 1176 0 GasA TA 1Story 2490 1 TA Lvl Gtl 13770 Corner 85.0 Reg 0 20 RL 340.0 BrkFace NaN 0 10 Sawyer 0 6 5 Y 0 NaN CompShg Gable Normal NaN NaN 0 Pave 6 1158.0 AllPub 0 1958 1998 2007
In [218]:
data.SaleType.value_counts()
Out[218]:
WD       2525
New       237
COD        87
ConLD      26
CWD        12
ConLI       9
ConLw       8
Oth         7
Con         5
Name: SaleType, dtype: int64
In [219]:
# Imputing with most common
data['SaleType'].loc[(data.index==2489)] = 'WD'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Utilities

In [220]:
data.loc[(data.Utilities.isnull())]
Out[220]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotFrontage LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SalePrice SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
1913 810 0 0 NaN 1 1Fam NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN 0.0 N Norm Norm FuseA 0 Fa Fa Wd Sdng Wd Sdng NaN NaN 0 CBlock 1 Min1 280.0 1.0 TA Unf TA Detchd 1975.0 810 0 GasA TA 1Story 1916 1 TA Lvl Gtl 21780 Inside 109.0 Reg 0 30 NaN 0.0 None NaN 0 3 IDOTRR 24 4 2 N 0 NaN CompShg Gable Normal NaN ConLD 0 Grvl 4 0.0 NaN 119 1910 1950 2009
1943 1474 0 0 NaN 3 1Fam TA No 0.0 0.0 Unf Unf 0.0 0.0 TA 1632.0 Y Feedr Norm FuseA 144 TA TA BrkFace BrkFace NaN Gd 2 CBlock 1 Min2 495.0 2.0 TA Unf TA Attchd 1952.0 1474 0 GasA TA 1Story 1946 1 TA Bnk Gtl 31220 FR2 NaN IR1 0 20 RL 0.0 None Shed 750 5 Gilbert 0 2 6 Y 0 NaN CompShg Hip Normal NaN WD 0 Pave 7 1632.0 NaN 0 1952 1952 2008

Maybe drop utilities?

In [221]:
data.Utilities.value_counts()
Out[221]:
AllPub    2914
NoSeWa       1
Name: Utilities, dtype: int64
In [222]:
# They clearly have gas and electricity, no way to tell if they have all utilities but ill impute with allpub
data['Utilities'].loc[(data.index==1915) | (data.index==1945)] ='AllPub'
C:\Users\dusty\Anaconda3\lib\site-packages\pandas\core\indexing.py:189: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [223]:
# checking to make sure we are ready to fill in na values with 0 or none
updated_list_of_na_columns = data.columns[data.isna().any()].tolist()
updated_list_of_na_columns
Out[223]:
['Alley',
 'BsmtCond',
 'BsmtExposure',
 'BsmtFinSF1',
 'BsmtFinSF2',
 'BsmtFinType1',
 'BsmtFinType2',
 'BsmtFullBath',
 'BsmtHalfBath',
 'BsmtQual',
 'BsmtUnfSF',
 'Electrical',
 'Exterior1st',
 'Exterior2nd',
 'Fence',
 'FireplaceQu',
 'Functional',
 'GarageArea',
 'GarageCars',
 'GarageCond',
 'GarageFinish',
 'GarageQual',
 'GarageType',
 'GarageYrBlt',
 'KitchenQual',
 'LotFrontage',
 'MSZoning',
 'MasVnrArea',
 'MasVnrType',
 'MiscFeature',
 'PoolQC',
 'SalePrice',
 'SaleType',
 'TotalBsmtSF',
 'Utilities']

All Data is ready for the mass fill besides LotFrontage

In [224]:
# First need to pull out SalePrice 
lot_frontage = data['LotFrontage']
del data['LotFrontage']

Mass filling in of null Values

In [225]:
# Want to drop saleprice first so we can keep those as nans
target= data['SalePrice']
del data['SalePrice']
In [226]:
# List Comprehension to fill in the null values, if column is Object fillna with None, else fillna with 0
# assigning it to a random variable so 'none' doesn't get printed a million times
_ = [data[col].fillna('None', inplace=True) if (data[col].dtype=='O') else data[col].fillna(0, inplace=True) for col in data]
In [227]:
# and add back in so we can explore the data
data=data.join(target)

Data Exploration

  • Before I impute Lot frontage with ML I want to explore the data a little
  • In this case I don't want to take dummies on objects yet because I want to explore some of the categorical variables as well

Since There are a lot of things to compare I will create a function for simple plotly graphs so I can call it whenever

In [228]:
# Need to do try and except and also multiple kwargs
def plotly_plot(df, colx, coly, chart_type,**kwargs):
    #try:
        #print (go.chart_type)
        trace = chart_type(x=df[colx], y=df[coly], **kwargs)
        plot = [trace]
        layout = go.Layout(
                xaxis=dict(
                    title = colx,
                        titlefont=dict(
                            family='Courier New, monospace',
                                size=18,
                                    color='#000000'
                        )
                ),
                yaxis = dict(
                    title=coly,
                        titlefont=dict(
                            family='Courier New, monospace',
                                size=18,
                                    color='#000000'
                        )
                )
        )
        fig=dict(data=plot,layout=layout)
        return offline.iplot(fig)
    #except:
        #print('Please use (go.) before your chart_type of choice')
    
In [229]:
plotly_plot(data, 'OverallCond', 'SalePrice', go.Box)

Its really interesting to see the outliers at the 5 and 6 overallcond. Higher OverallCond does not mean higher price

Lets check OverallQual

In [230]:
plotly_plot(data, 'OverallQual', 'SalePrice', go.Box)

So here we see the trend

In [231]:
## Checking to see if there is a relationship between bedrooms(above ground) and sale price
plotly_plot(data, 'BedroomAbvGr', 'SalePrice', go.Box)
In [232]:
# what about Total Rooms?
plotly_plot(data, 'TotRmsAbvGrd', 'SalePrice', go.Box)

somewhat of a Trend. This makes me think about rooms and sqft.

  • Id assume there is a relationship
In [233]:
plotly_plot(data, 'TotRmsAbvGrd', 'GrLivArea', go.Box)
In [234]:
bed_bath_group = data.groupby('BedroomAbvGr', as_index=False)['FullBath'].agg('mean')
plotly_plot(bed_bath_group, 'BedroomAbvGr', 'FullBath',go.Scatter)
In [235]:
# setting up groupbys for the chart
max_bath_group = data.groupby('BedroomAbvGr', as_index=False)['FullBath'].agg('max')
avg_bath_group = data.groupby('BedroomAbvGr', as_index=False)['FullBath'].agg('mean')
min_bath_group = data.groupby('BedroomAbvGr', as_index=False)['FullBath'].agg('min')
# Create and style traces
trace0 = go.Scatter(
    x = max_bath_group['BedroomAbvGr'],
    y = max_bath_group['FullBath'],
    name = 'Max baths',
    line = dict(
        color = ('rgb(76, 153, 0)'),
        width = 4)
)
trace1 = go.Scatter(
    x = avg_bath_group['BedroomAbvGr'],
    y = avg_bath_group['FullBath'],
    name = 'Avg Baths',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,)
)
trace2 = go.Scatter(
    x = min_bath_group['BedroomAbvGr'],
    y = min_bath_group['FullBath'],
    name = 'Min baths',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4,
        dash = 'dash') # dash options include 'dash', 'dot', and 'dashdot'
)

plot = [trace0, trace1, trace2]

# Edit the layout
layout = dict(title = 'Min, Avg, and Max bathrooms per bedrooms',
              xaxis = dict(title = 'Bedrooms(Above Ground)'),
              yaxis = dict(title = 'Baths'),
              )

fig = dict(data=plot, layout=layout)
iplot(fig)

Interesting that some places dont have bathrooms?

In [236]:
data.loc[(data.FullBath ==0)]
Out[236]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold SalePrice
53 1842 0 0 None 0 1Fam TA Gd 1810.0 0.0 GLQ Unf 2.0 0.0 Ex 32.0 Y Norm Norm SBrkr 0 TA Gd WdShing Wd Shng None Gd 1 CBlock 0 Typ 894.0 3.0 TA Fin TA Attchd 1981.0 1842 1 GasA Gd 1Story 54 1 Gd Low Gtl 50271 Inside IR1 0 20 RL 0.0 None None 0 11 Veenker 72 5 9 Y 0 None WdShngl Gable Normal WD 0 Pave 5 1842.0 AllPub 857 1981 1987 2006 385000.0
188 1224 0 0 None 2 Duplex TA Av 1086.0 0.0 GLQ Unf 2.0 0.0 Gd 0.0 Y Feedr Norm SBrkr 0 TA TA Plywood Plywood None TA 2 CBlock 0 Typ 528.0 2.0 TA Unf TA Detchd 1979.0 1224 2 GasA TA SFoyer 189 2 TA Bnk Gtl 7018 Inside Reg 0 90 RL 275.0 Stone None 0 6 SawyerW 0 5 5 Y 0 None CompShg Gable Alloca WD 0 Pave 6 1086.0 AllPub 120 1979 1979 2009 153337.0
375 904 0 0 None 1 1Fam Po Gd 350.0 0.0 BLQ Unf 1.0 0.0 Fa 333.0 N Norm Norm FuseA 0 Fa Fa Wd Sdng Wd Sdng None None 0 BrkTil 0 Maj1 0.0 0.0 None None None None 0.0 904 1 GasA Gd 1Story 376 1 Fa Low Sev 10020 Inside IR1 0 30 RL 0.0 None None 0 3 Edwards 0 1 1 Y 0 None CompShg Gable Normal WD 0 Pave 4 683.0 AllPub 0 1922 1950 2009 61000.0
596 1402 0 0 None 2 TwnhsE TA Av 0.0 0.0 Unf Unf 0.0 2.0 Ex 1258.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd None Gd 1 PConc 0 Typ 648.0 3.0 TA Fin TA Attchd 2006.0 1402 2 GasA Ex 1Story 598 1 Gd Lvl Gtl 3922 Inside Reg 0 120 RL 72.0 BrkFace None 0 2 Blmngtn 16 5 7 Y 0 None CompShg Gable Partial New 0 Pave 7 1258.0 AllPub 120 2006 2007 2007 194201.0
633 1056 0 0 None 0 Duplex TA No 1056.0 0.0 GLQ Unf 2.0 0.0 TA 0.0 Y Norm Norm SBrkr 0 TA TA Plywood Plywood GdPrv None 0 CBlock 0 Typ 576.0 2.0 TA Unf TA Detchd 1980.0 1056 0 GasA Gd SFoyer 635 2 TA Lvl Gtl 6979 Inside Reg 0 90 RL 0.0 None Shed 600 6 OldTown 56 5 6 Y 0 None CompShg Gable Normal WD 0 Pave 4 1056.0 AllPub 264 1980 1980 2010 144000.0
915 480 0 0 None 1 1Fam TA Av 50.0 0.0 BLQ Unf 1.0 0.0 TA 430.0 N Norm Norm FuseA 0 TA TA AsbShng AsbShng None None 0 CBlock 0 Typ 308.0 1.0 TA Unf TA Detchd 1958.0 480 0 GasA TA 1Story 917 1 TA Lvl Gtl 9000 Inside Reg 0 20 C (all) 0.0 None None 0 10 IDOTRR 0 3 2 Y 0 None CompShg Gable Abnorml WD 0 Pave 4 480.0 AllPub 0 1949 1950 2006 35311.0
1162 1258 0 0 None 0 Duplex TA Av 1198.0 0.0 GLQ Unf 2.0 0.0 Gd 0.0 Y Feedr Norm SBrkr 0 TA TA Plywood Plywood None None 0 CBlock 0 Typ 400.0 2.0 TA Unf Fa CarPort 1969.0 1258 2 GasA TA SFoyer 1164 2 TA Lvl Gtl 12900 Inside Reg 0 90 RL 0.0 None None 0 1 Sawyer 0 4 4 Y 0 None CompShg Gable Alloca WD 0 Pave 6 1198.0 AllPub 120 1969 1969 2008 108959.0
1212 960 0 0 None 0 1Fam Gd Av 648.0 0.0 GLQ Unf 1.0 1.0 TA 0.0 Y Norm Norm SBrkr 0 Gd TA VinylSd VinylSd None None 0 CBlock 0 Typ 364.0 1.0 TA Unf TA Attchd 1965.0 960 0 GasA Ex SLvl 1214 1 TA Lvl Gtl 10246 CulDSac IR1 0 80 RL 0.0 None None 0 5 Sawyer 0 9 4 Y 0 None CompShg Gable Normal WD 0 Pave 3 648.0 AllPub 88 1965 2001 2006 145000.0
1269 1332 192 0 None 0 1Fam TA Gd 1258.0 0.0 GLQ Unf 2.0 0.0 Gd 74.0 Y Norm Norm SBrkr 0 TA Gd Plywood Plywood None TA 1 PConc 0 Typ 586.0 2.0 TA Fin TA Attchd 1979.0 1524 1 GasA TA 1Story 1271 1 Gd Low Sev 23595 Inside Reg 0 40 RL 0.0 None None 0 4 ClearCr 0 6 7 Y 0 None WdShake Shed Normal WD 0 Pave 4 1332.0 AllPub 268 1979 1979 2010 260000.0
1857 1229 0 0 None 2 Duplex TA Av 1094.0 0.0 GLQ Unf 2.0 0.0 Gd 0.0 Y Feedr Norm SBrkr 0 TA TA Plywood Plywood None TA 2 CBlock 0 Typ 672.0 2.0 TA Unf TA Detchd 1979.0 1229 2 GasA TA SFoyer 1860 2 Gd Lvl Gtl 7040 Inside Reg 0 90 RL 216.0 BrkFace None 0 6 SawyerW 0 5 5 Y 0 None CompShg Gable Alloca WD 0 Pave 6 1094.0 AllPub 120 1979 1979 2009 NaN
2511 1743 0 0 None 0 1Fam Gd Gd 51.0 915.0 LwQ GLQ 2.0 0.0 Gd 0.0 Y Norm Norm SBrkr 0 TA Gd Wd Sdng Wd Sdng None Fa 2 CBlock 0 Typ 529.0 2.0 TA Fin TA Attchd 1976.0 1743 1 GasA Ex 1Story 2514 1 Gd Low Sev 20064 Inside IR1 0 20 RL 0.0 None None 0 5 ClearCr 0 6 8 Y 0 None WdShngl Shed Normal WD 0 Pave 5 966.0 AllPub 646 1976 1976 2007 NaN
2598 936 0 0 None 0 TwnhsE TA Av 16.0 904.0 Rec GLQ 2.0 0.0 Ex 0.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd None None 0 PConc 0 Typ 460.0 2.0 TA Fin TA Attchd 1996.0 936 1 GasA Ex SFoyer 2601 1 TA Lvl Gtl 6710 FR3 IR1 0 120 RM 134.0 BrkFace None 0 6 Mitchel 40 5 6 Y 0 None CompShg Gable Normal WD 0 Pave 3 920.0 AllPub 0 1996 1997 2007 NaN
In [237]:
data.loc[(data.FullBath==0) & (data.BsmtFullBath ==0)]
Out[237]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold SalePrice
596 1402 0 0 None 2 TwnhsE TA Av 0.0 0.0 Unf Unf 0.0 2.0 Ex 1258.0 Y Norm Norm SBrkr 0 TA Gd VinylSd VinylSd None Gd 1 PConc 0 Typ 648.0 3.0 TA Fin TA Attchd 2006.0 1402 2 GasA Ex 1Story 598 1 Gd Lvl Gtl 3922 Inside Reg 0 120 RL 72.0 BrkFace None 0 2 Blmngtn 16 5 7 Y 0 None CompShg Gable Partial New 0 Pave 7 1258.0 AllPub 120 2006 2007 2007 194201.0

These residences do have bathrooms, theyre just in the basement. I can assume these are solid data points

Even the one above has some half baths(I'm assuming it could be 3/4 baths?)

0 Bedrooms?

In [238]:
data.loc[(data.BedroomAbvGr==0)]
Out[238]:
1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 BsmtFinType1 BsmtFinType2 BsmtFullBath BsmtHalfBath BsmtQual BsmtUnfSF CentralAir Condition1 Condition2 Electrical EnclosedPorch ExterCond ExterQual Exterior1st Exterior2nd Fence FireplaceQu Fireplaces Foundation FullBath Functional GarageArea GarageCars GarageCond GarageFinish GarageQual GarageType GarageYrBlt GrLivArea HalfBath Heating HeatingQC HouseStyle Id KitchenAbvGr KitchenQual LandContour LandSlope LotArea LotConfig LotShape LowQualFinSF MSSubClass MSZoning MasVnrArea MasVnrType MiscFeature MiscVal MoSold Neighborhood OpenPorchSF OverallCond OverallQual PavedDrive PoolArea PoolQC RoofMatl RoofStyle SaleCondition SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold SalePrice
53 1842 0 0 None 0 1Fam TA Gd 1810.0 0.0 GLQ Unf 2.0 0.0 Ex 32.0 Y Norm Norm SBrkr 0 TA Gd WdShing Wd Shng None Gd 1 CBlock 0 Typ 894.0 3.0 TA Fin TA Attchd 1981.0 1842 1 GasA Gd 1Story 54 1 Gd Low Gtl 50271 Inside IR1 0 20 RL 0.0 None None 0 11 Veenker 72 5 9 Y 0 None WdShngl Gable Normal WD 0 Pave 5 1842.0 AllPub 857 1981 1987 2006 385000.0
189 1593 0 0 None 0 TwnhsE TA Av 1153.0 0.0 GLQ Unf 1.0 0.0 Ex 440.0 Y Norm Norm SBrkr 0 TA Gd CemntBd CmentBd None Gd 1 PConc 1 Typ 682.0 2.0 TA Fin TA Attchd 2001.0 1593 1 GasA Ex 1Story 190 1 Ex Lvl Gtl 4923 Inside Reg 0 120 RL 0.0 None None 0 8 StoneBr 120 5 8 Y 0 None CompShg Gable Normal WD 224 Pave 5 1593.0 AllPub 0 2001 2002 2008 286000.0
633 1056 0 0 None 0 Duplex TA No 1056.0 0.0 GLQ Unf 2.0 0.0 TA 0.0 Y Norm Norm SBrkr 0 TA TA Plywood Plywood GdPrv None 0 CBlock 0 Typ 576.0 2.0 TA Unf TA Detchd 1980.0 1056 0 GasA Gd SFoyer 635 2 TA Lvl Gtl 6979 Inside Reg 0 90 RL 0.0 None Shed 600 6 OldTown 56 5 6 Y 0 None CompShg Gable Normal WD 0 Pave 4 1056.0 AllPub 264 1980 1980 2010 144000.0
1162 1258 0 0 None 0 Duplex TA Av 1198.0 0.0 GLQ Unf 2.0 0.0 Gd 0.0 Y Feedr Norm SBrkr 0 TA TA Plywood Plywood None None 0 CBlock 0 Typ 400.0 2.0 TA Unf Fa CarPort 1969.0 1258 2 GasA TA SFoyer 1164 2 TA Lvl Gtl 12900 Inside Reg 0 90 RL 0.0 None None 0 1 Sawyer 0 4 4 Y 0 None CompShg Gable Alloca WD 0 Pave 6 1198.0 AllPub 120 1969 1969 2008 108959.0
1212 960 0 0 None 0 1Fam Gd Av 648.0 0.0 GLQ Unf 1.0 1.0 TA 0.0 Y Norm Norm SBrkr 0 Gd TA VinylSd VinylSd None None 0 CBlock 0 Typ 364.0 1.0 TA Unf TA Attchd 1965.0 960 0 GasA Ex SLvl 1214 1 TA Lvl Gtl 10246 CulDSac IR1 0 80 RL 0.0 None None 0 5 Sawyer 0 9 4 Y 0 None CompShg Gable Normal WD 0 Pave 3 648.0 AllPub 88 1965 2001 2006 145000.0
1269 1332 192 0 None 0 1Fam TA Gd 1258.0 0.0 GLQ Unf 2.0 0.0 Gd 74.0 Y Norm Norm SBrkr 0 TA Gd Plywood Plywood None TA 1 PConc 0 Typ 586.0 2.0 TA Fin TA Attchd 1979.0 1524 1 GasA TA 1Story 1271 1 Gd Low Sev 23595 Inside Reg 0 40 RL 0.0 None None 0 4 ClearCr 0 6 7 Y 0 None WdShake Shed Normal WD 0 Pave 4 1332.0 AllPub 268 1979 1979 2010 260000.0
2511 1743 0 0 None 0 1Fam Gd Gd 51.0 915.0 LwQ GLQ 2.0 0.0 Gd 0.0 Y Norm Norm SBrkr 0 TA Gd Wd Sdng Wd Sdng None Fa 2 CBlock 0 Typ 529.0 2.0 TA Fin TA Attchd 1976.0 1743 1 GasA Ex 1Story 2514 1 Gd Low Sev 20064 Inside IR1 0 20 RL 0.0 None None 0 5 ClearCr 0 6 8 Y 0 None WdShngl Shed Normal WD 0 Pave 5 966.0 AllPub 646 1976 1976 2007 NaN
2598 936 0 0 None 0 TwnhsE TA Av 16.0 904.0 Rec GLQ 2.0 0.0 Ex 0.0 Y Norm Norm SBrkr 0 TA TA VinylSd VinylSd None None 0 PConc 0 Typ 460.0 2.0 TA Fin TA Attchd 1996.0 936 1 GasA Ex SFoyer 2601 1 TA Lvl Gtl 6710 FR3 IR1 0 120 RM 134.0 BrkFace None 0 6 Mitchel 40 5 6 Y 0 None CompShg Gable Normal WD 0 Pave 3 920.0 AllPub 0 1996 1997 2007 NaN

They all have fairly large FINISHED basements. This is acceptable

In [239]:
plotly_plot(data, 'FullBath', 'GrLivArea', go.Box)

Possible Outliers?

In [240]:
plotly_plot(data, 'GrLivArea', 'SalePrice',go.Scatter, mode='markers')

Clearly a trend here.

  • Are there outliers we need to get rid of?
    • I think we should get rid of any sqft > 4000 and any sale price > 500k
    • We need to do this on the train set! ^ up above I will do it #### For fun I'm going to add in LotArea and do a 3d
In [241]:
plotly_plot(data, 'GrLivArea', 'LotArea', go.Scatter3d, mode='markers', z=data['SalePrice'])
In [242]:
qual_sf_group = data.groupby('OverallQual', as_index=False )['GrLivArea'].agg('mean')
plotly_plot(qual_sf_group, 'OverallQual', 'GrLivArea', go.Bar)
In [243]:
# Curious if there are any houses where the basement is larger than the 1stFloorSF, that would be weird?
plotly_plot(data, '1stFlrSF', 'TotalBsmtSF', go.Scatter, mode='markers')

There are some, but theres a pretty linear relationship like I thought

In [244]:
# groupby to get average
month_sold_group = data.groupby('MoSold', as_index=False)['SalePrice'].agg('mean')
plotly_plot(month_sold_group, 'MoSold', 'SalePrice', go.Scatter, mode='lines')

Interesting, There is no real correlation between sale price and month, this also shows us that we need to get dummies on 'MoSold'

Going to add in min and max per month and see what that looks like

In [245]:
# setting up groupbys for the chart
max_sold_group = data.groupby('MoSold', as_index=False)['SalePrice'].agg('max')
avg_sold_group = data.groupby('MoSold', as_index=False)['SalePrice'].agg('mean')
min_sold_group = data.groupby('MoSold', as_index=False)['SalePrice'].agg('min')
# Create and style traces
trace0 = go.Scatter(
    x = max_sold_group['MoSold'],
    y = max_sold_group['SalePrice'],
    name = 'Max Sale Price',
    line = dict(
        color = ('rgb(76, 153, 0)'),
        width = 4)
)
trace1 = go.Scatter(
    x = avg_sold_group['MoSold'],
    y = avg_sold_group['SalePrice'],
    name = 'Avg Sale Price',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,)
)
trace2 = go.Scatter(
    x = min_sold_group['MoSold'],
    y = min_sold_group['SalePrice'],
    name = 'Min Sale Price',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4,
        dash = 'dash') # dash options include 'dash', 'dot', and 'dashdot'
)

plot = [trace0, trace1, trace2]

# Edit the layout
layout = dict(title = 'Min, Avg, and Max Sale Prices per month',
              xaxis = dict(title = 'Month'),
              yaxis = dict(title = 'Sale Price'),
              )

fig = dict(data=plot, layout=layout)
iplot(fig)

Interesting to see the variance in the max values, but mostly the average doesn't vary too much. Further proof we need to convert MoSold to a string and then take dummies

In [246]:
plotly_plot(data, 'YrSold', 'SalePrice', go.Box)

So Strange that YrSold does not effect sale price at all

  • Should change this to Categorical and take dummies!
In [247]:
plotly_plot(data, 'Neighborhood', 'SalePrice', go.Box)

Bubble Chart

Lastly I want to take a look at the distribution of SalePrice and possibly others, even though we already saw it in the pandas profile above, it would be interesting to have a better graph

Feature Engineering

  1. replacing year with 0 if yearremodadd=yrbuilt(should not have a value if never remodeled
  2. changing numeric col to str(to take dummies)
  3. TotalSF = GrlivArea + TotalBsmtSF
  4. GrLivArea/ LotArea
  5. Lot frontage/lot area

1. If yearremodadd = yearbuilt, replace value in yearremodadd with 0(according to docs this means there was no remodel)

  • Surprisingly this did not really do anythin
In [251]:
# if year built and remodadd are the same replace yearremodadd with 0. They should not have a value if theyve never been remodeled
data['YearRemodAdd']= np.where(data.YearRemodAdd == data.YearBuilt, 0, data.YearRemodAdd) 

2. Changing stuff to strings

In [252]:
# Change MoSold from int to a String so you can take dummies(based on chart above)
def turn_obj(cols):
    for col in cols:
        data[col] = data[col].astype(str)
In [253]:
turn_obj(['MoSold', 'YrSold', 'OverallCond', 'MSSubClass', 'GarageCars'])

3. Total Sq ft(including basement)

In [258]:
data['TotalSF'] = data['GrLivArea'] + data['TotalBsmtSF']

4. Sq ft divided by Lot Area

In [259]:
data['totalSF_by_LotArea'] = data['TotalSF'] / data['LotArea']

Remove ID!

In [261]:
# Need to drop SalePrice because it has NaNs and we dont want it in the algo to impute LotFrontage
# will create target again, just because
target = data['SalePrice']
missing_sales = data[data['SalePrice'].isnull()]
sub_id = missing_sales['Id']
# delete so its not used
#del data['Id']
data.drop(['SalePrice', 'Id'], axis=1,inplace=True)
# Bring in LotFrontage
data=data.join(lot_frontage)
In [262]:
data.groupby('Neighborhood')['LotFrontage'].agg('mean')
Out[262]:
Neighborhood
Blmngtn    46.900000
Blueste    27.300000
BrDale     21.500000
BrkSide    55.789474
ClearCr    88.150000
CollgCr    71.336364
Crawfor    69.951807
Edwards    65.153409
Gilbert    74.207207
IDOTRR     62.241379
MeadowV    25.606061
Mitchel    75.144444
NAmes      75.210667
NPkVill    28.142857
NWAmes     81.517647
NoRidge    91.629630
NridgHt    84.184049
OldTown    61.777293
SWISU      59.068182
Sawyer     74.551020
SawyerW    70.669811
Somerst    64.549383
StoneBr    62.173913
Timber     81.157895
Veenker    72.000000
Name: LotFrontage, dtype: float64

Try mean and median, see what works best

In [263]:
# Will first impute missing lotfrontage values with the mean based on the neighborhood
data['LotFrontage'] = data.groupby('Neighborhood')['LotFrontage'].transform(lambda x: x.fillna(x.mean()))
In [264]:
# 1 more piece of feature engineering
data['lotarea-frontage'] = data['LotFrontage'] / data['LotArea']

Is anything too correlated with each other?

In [265]:
plt.figure(figsize=(16,8))
sns.heatmap(data.corr(), annot=True)
Out[265]:
<matplotlib.axes._subplots.AxesSubplot at 0x279deb78550>

Using Labelencoding, as it has worked for mls data

In [268]:
from sklearn import preprocessing
cols_for_label = ['BsmtCond', 'BsmtQual', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2','Condition1', 
                  'Condition2', 'ExterQual', 'GarageCond', 'GarageQual', 'GarageType', 'KitchenQual', 'LotShape', 
                 'LotConfig', 'MiscFeature', 'PavedDrive', 'Functional', 'Fence', 'Alley', 'YearRemodAdd']
# loop to use labelencoder on the chosen columns
le = preprocessing.LabelEncoder()
for col in cols_for_label:
    le.fit(data[col])
    list(le.classes_)
    data[col] = le.transform(data[col])

Now to check distribution/ skewness

In [269]:
target_no_nan = target.dropna()
In [270]:
# this code taken from another user here on kaggle. Great stuff thank you!
def check_skewness(df):
    sns.distplot(df, fit = norm);
    fig =plt.figure(figsize=(16,8))
    res = stats.probplot(df, plot=plt)
    # get fitted parameters used by the function
    (avg, std) = norm.fit(df)
    print ('\n avg = {:.2f} and std = {:.2f}\n' .format(avg, std))
check_skewness(target_no_nan)
C:\Users\dusty\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning:

Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.

 avg = 180932.92 and std = 79467.79

In [271]:
target_no_nan = np.log1p(target_no_nan)

check_skewness(target_no_nan)
C:\Users\dusty\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning:

Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.

 avg = 12.02 and std = 0.40

Good Stuff

Looking at Skew For Features

In [272]:
num_feats = data.dtypes[data.dtypes != 'object'].index
#check skew
skewed_feats = data[num_feats].apply(lambda x:skew(x)).sort_values(ascending=False)
skewness = pd.DataFrame({'sKew':skewed_feats})
#skewness = skewness.drop(['price'])
skewness.head()
Out[272]:
sKew
MiscVal 21.939672
PoolArea 17.688664
LotArea 13.109495
Condition2 12.340989
LowQualFinSF 12.084539

Fixing Skew For Features

In [274]:
# Boxcox fix skew
skewness = skewness[abs(skewness) > .75]
print (skewness.shape[0])
from scipy.special import boxcox1p
skewed_features = skewness.index
lam = .15
for feat in skewed_features:
    data[feat] = boxcox1p(data[feat], lam)
53
In [275]:
data = data.join(target_no_nan)

Get Dummies

In [276]:
def get_dummies(df):
    future_drop = [col for col in df if df[col].dtype == 'O']
    # I know get dummies only takes Objects but if I don't do the list comp inside it gives me a columns overlap error
    df = df.join(pd.get_dummies(df[[col for col in df if df[col].dtype == 'O']], drop_first=True)).drop(future_drop, axis=1) 
    return df
    #df.drop(future_drop, axis=1, inplace=True)
#data = get_dummies(data)
In [277]:
data=get_dummies(data)

Modeling

  • Will try a couple models and then average them
In [278]:
from sklearn.linear_model import ElasticNet, Lasso, BayesianRidge, LassoLarsIC, LinearRegression, Ridge
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.kernel_ridge import KernelRidge
from sklearn.svm import SVR
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import RobustScaler, MinMaxScaler, StandardScaler, Normalizer, MaxAbsScaler, FunctionTransformer
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, clone
from sklearn.model_selection import KFold, cross_val_score, train_test_split
from sklearn.metrics import mean_squared_error
import xgboost as xgb
from xgboost.sklearn import XGBRegressor
In [279]:
missing_price = data[data['SalePrice'].isnull()]
filled_price = data[data['SalePrice'].notnull()]
In [280]:
X_train, X_test, y_train, y_test = train_test_split(filled_price.drop('SalePrice', axis=1),filled_price['SalePrice'], test_size=.2, random_state=42)
In [302]:
# StandardScaler was almost identical to robust but gave warnings
# FunctionTransformer helped but not too noticeable
# RobustScaler worked the best
gbr=  make_pipeline(RobustScaler(),GradientBoostingRegressor(n_estimators=800, learning_rate=0.05,
                                  max_depth=4, max_features='log2',
                                  min_samples_leaf=8, min_samples_split=6,
                                  loss='huber', random_state=42))
br = make_pipeline(RobustScaler(),BayesianRidge())
r = make_pipeline(RobustScaler(),Ridge())
xgb = make_pipeline(RobustScaler(),XGBRegressor())
svr =make_pipeline(RobustScaler(),SVR(kernel='linear'))
# Lasso and enet were way off until I messed with alpha,possibly fine tuning will bring a better scores
l = make_pipeline(RobustScaler(),Lasso(alpha=.0005))
enet = make_pipeline(RobustScaler(), ElasticNet(alpha=.001))
In [303]:
# going to put cross_val in the tdmassess
def cv_score(algo):
    rmse= np.sqrt(-cross_val_score(algo, X_train, y_train, scoring='neg_mean_squared_error', cv=5))
    return (rmse.mean())
In [304]:
algorithms = [gbr, br, r, xgb, svr,l, enet]
names = ['Gradient Boosting', 'Bayesian Ridge', 'Ridge', 'XGB', 'SVR', 'Lasso','ElasticNet']
def tDMassess_regression():
    #fit the data
    for i in range(len(algorithms)):
        algorithms[i] = algorithms[i].fit(X_train,y_train)
    cv_rmse =[]
    rmse_train=[]
    rmse_test=[]
    for i in range(len(algorithms)):
        rmse_train.append(mean_squared_error(np.expm1(y_train), np.expm1(algorithms[i].predict(X_train))) **.5)
        rmse_test.append(mean_squared_error(np.expm1(y_test), np.expm1(algorithms[i].predict(X_test)))**.5)
        cv_rmse.append(cv_score(algorithms[i]))
    metrics = pd.DataFrame(columns =['RMSE_train', 'RMSE_test', 'cv_RMSE'], index=names)
    metrics['RMSE_train'] = rmse_train
    metrics['RMSE_test'] = rmse_test
    metrics['cv_RMSE'] = cv_rmse
    return metrics
In [305]:
tDMassess_regression()
Out[305]:
RMSE_train RMSE_test cv_RMSE
Gradient Boosting 11097.327918 19622.901485 0.118547
Bayesian Ridge 19516.157207 20201.897847 0.118962
Ridge 18228.267035 20574.464664 0.124067
XGB 15947.390443 21002.188710 0.128536
SVR 18454.451846 22029.556570 0.131941
Lasso 19961.627847 19610.959301 0.117945
ElasticNet 20039.274011 19628.842211 0.117798

Function to take the average of all the models

  • not going to use SVR
In [306]:
final_algs = [gbr, br, l,enet]
def average_of_models():
    final_pred=[]
    for i in range(len(final_algs)):
         final_pred.append(np.expm1(final_algs[i].predict(missing_price.drop('SalePrice', axis=1))))
    return (sum(final_pred)/len(final_algs))
In [307]:
avg_preds = average_of_models()

Best score on kaggle was .12425

  • No label encoder
  • and average of gbr, br, r, and xgb.

Now to Submit

In [308]:
submission = pd.DataFrame()
submission['Id'] =sub_id
submission['SalePrice'] = avg_preds
submission.to_csv('final_sub_LE_la_enet.csv', index=False)